Академический Документы
Профессиональный Документы
Культура Документы
In finance, volatility most frequently refers to the standard deviation of the continuously
compounded returns of a financial instrument within a specific time horizon. It is used to
quantify the risk of the financial instrument over the specified time period. Volatility is normally
expressed in annualized terms, and it may either be an absolute number ($5) or a fraction of the
mean (5%).
[edit] Volatility terminology
Volatility as described here refers to the actual current volatility of a financial instrument for a
specified period (for example 30 days or 90 days). It is the volatility of a financial instrument
based on historical prices over the specified period with the last observation the most recent
price. This phrase is used particularly when it is wished to distinguish between the actual current
volatility of an instrument and
• actual historical volatility which refers to the volatiltiy of a financial instrument over a
specified period but with the last observation on a date in the past
• actual future volatilty which refers to the volatility of a financial instrument over a
specified period starting at the current time and ending at a future date (normally the
expiry date of a option)
• historical implied volatility which refers to the implied volatility observed from
historical prices of the financial instrument (normally options)
• current implied volatility which refers to the implied volatility observed from current
prices of the financial instrument
• future implied volatility which refers to the implied volatility observed from future
prices of the financial instrument
For a financial instrument whose price follows a Gaussianrandom walk, or Wiener process, the
width of the distribution increases as time increases. This is because there is an increasing
probability that the instrument's price will be farther away from the initial price as time increases.
However, rather than increase linearly, the volatility increases with the square-root of time as
time increases, because some fluctuations are expected to cancel each other out, so the most
likely deviation after twice the time will not be twice the distance from zero.
Since observed price changes do not follow Gaussian distributions, others such as the Levy
Distribution are often used.[1] These can capture attributes such as "fat tails" although their
variance remains finite.
[edit] Volatility for market players
When investing directly in a security, volatility is often viewed as a negative in that it represents
uncertainty and risk. However, with other investing strategies, volatility is often desirable. For
example, if an investor is short on the peaks, and long on the lows of a security, the profit will be
greatest when volatility is highest.
In today's markets, it is also possible to trade volatility directly, through the use of derivative
securities such as options and variance swaps. See Volatility arbitrage.
[edit] Volatility versus direction
Volatility does not measure the direction of price changes, merely how dispersed they are
expected to be. This is because when calculating standard deviation (or variance), all differences
are squared, so that negative and positive differences are combined into one quantity. Two
instruments with different volatilities may have the same expected return, but the instrument with
higher volatility will have a larger swings in values at the end of a given period of time.
For example, a lower volatility stock may have an expected (average) return of 7%, with annual
volatility of 5%. This would indicate returns from approximately -3% to 17% most of the time
(19 times out of 20, or 95%). A higher volatility stock, with the same expected return of 7% but
with annual volatility of 20%, would indicate returns from approximately -33% to 47% most of
the time (19 times out of 20, or 95%)
Volatility is a poor measure of risk, as explained by Peter Carr, "it is only a good measure of risk
if you feel that being rich then being poor is the same as being poor then rich".
[edit] Volatility over time
Although the Black Scholes equation assumes predictable constant volatility, none of these are
observed in real markets, and amongst the models are are Bruno Dupire's Local Volatility,
Poisson Process where volatility jumps to new levels with a predictable frequency, and the
increasingly popular Heston model of Stochastic Volatility.[2]
It's common knowledge that types of assets experience periods of high and low volatility. That
is, during some periods prices go up and down quickly, while during other times they might not
seem to move at all.
Periods when prices fall quickly (a crash) are often followed by prices going down even more, or
going up by an unusual amount. Also, a time when prices rise quickly (a bubble) may often be
followed by prices going up even more, or going down by an unusual amount.
The converse behavior, 'doldrums' can last for a long time as well.
Most typically, extreme movements do not appear 'out of nowhere'; they're presaged by larger
movements than usual. This is termed autoregressive conditional heteroskedasticity. Of course,
whether such large movements have the same direction, or the opposite, is more difficult to say.
And an increase in volatility does not always presage a further increase—the volatility may
simply go back down again.
[edit] Mathematical definition
The annualized volatility σ is the standard deviation of the instrument's logarithmic returns in a
year.
The generalized volatility σT for time horizon T in years is expressed as:
Therefore, if the daily logarithmic returns of a stock have a standard deviation of σSD and the
time period of returns is P, the annualized volatility is
A common assumption is that P = 1/252 (there are 252 trading days in any given year). Then, if
σSD = 0.01 the annualized volatility is
Note that the formula used to annualize returns is not deterministic, but is an extrapolation valid
for a random walk process whose steps have finite variance. Generally, the relation between
volatility in different time scales is more complicated, involving the Lévy stability exponent α:
If α = 2 you get the Wiener process scaling relation, but some people believe α < 2 for financial
activities such as stocks, indexes and so on. This was discovered by Benoît Mandelbrot, who
looked at cotton prices and found that they followed a Lévy alpha-stable distribution with
α&nsbp;= 1.7. (See New Scientist, 19 April, 1997].) Mandelbrot's conclusion is, however, not
accepted by mainstream financial econometricians.
[edit] Quick-and-dirty (percentage) volatility measurement
Suppose you notice that a market price index which is approximately 10,000, moves about 100
points a day on average. This would constitute a 1% (up or down) daily movement.
To annualize this, you can use the "rule of 16", that is, multiply by 16 to get 16% as the overall
(annual) volatility. The rationale for this is that 16 is the square root of 256, which is
approximately the number of actual trading days in a year. This uses the statistical result that the
standard deviation of the sum of n independent variables (with equal standard deviations) is ∛n
times the standard deviation of the individual variables.
It also takes the average magnitude of the observations as an approximation to the standard
deviation of the variables. Under the assumption that the variables are normally distributed with
mean zero and standard deviation σ, the expected value of the magnitude of the observations is
√(2/π)σ = 0.798σ, thus the observed average magnitude of the observations may be taken as a
rough approximation to σ.
Wiener process
A single realization of a one-dimensional Wiener process
is a Wiener process for any nonzero constant α. The Wiener measure is the probability law on
the space of continuous functionsg, with g(0) = 0, induced by the Wiener process. An integral
based on Wiener measure may be called a Wiener integral.
[edit]Properties of a one-dimensional Wiener process
[edit]Basic properties
The unconditional probability density function at a fixed time t:
The results for the expectation and variance follow immediately from the definition that
increments have a normal distribution, centred at zero. Thus
The results for the covariance and correlation follow from the definition that non-overlapping
increments are independent, of which only the property that they are uncorrelated is used.
Suppose that t1<t2.
Thus
[edit]Self-similarity
[edit]Brownian scaling
[edit]Time reversal
is a martingale.
martingale, which shows that the quadratic variation of the martingale on [0,t] is equal
to
About functions p(x,t) more general than polynomials, see local martingales.
[edit]Some properties of sample paths
The set of all functions w with these properties is of full Wiener measure. That is, a path (sample
function) of the Wiener process has all these properties almost surely.
[edit]Qualitative properties
• For every ε>0, the function w takes both (strictly) positive and (strictly)
negative values on (0,ε).
• The function w is continuous everywhere but differentiable nowhere (like the
Weierstrass function).
• Points of local maximum of the function w are a dense countable set; the
maximum values are pairwise different; each local maximum is sharp in the
following sense: if w has a local maximum at t then
[edit]Quantitative properties
[edit]Law of the iterated logarithm
[edit]Modulus of continuity
Local modulus of continuity:
[edit]Local time
The image of the Lebesgue measure on [0, t] under the map w (the pushforward measure) has a
density Lt(·). Thus,
for a wide class of functions ƒ (namely: all continuous functions; all locally integrable functions;
all non-negative measurable functions). The density Lt is (more exactly, can and will be chosen
to be) continuous. The number Lt(x) is called the local time at x of w on [0, t]. It is strictly
positive for all x of the interval (a, b) where a and b are the least and the greatest value of w on
[0, t], respectively. (For x outside this interval the local time evidently vanishes.) Treated as a
function of two variables x and t, the local time is still continuous. Treated as a function of t
(while x is fixed), the local time is a singular function corresponding to a nonatomic measure on
the set of zeros of w.
These continuity properties are fairly non-trivial. Consider that the local time can also be defined
(as the density of the pushforward measure) for a smooth function. Then, however, the density is
discontinuous, unless the given function is monotone. In other words, there is a conflict between
good behavior of a function and good behavior of its local time. In this sense, the continuity of
the local time of the Wiener process is another manifestation of non-smoothness of the
trajectory.
[edit]Related processes
The generator of a Brownian motion is ½ times the Laplace-Beltrami operator. The
image above is of the Brownian motion on a special manifold: the surface of a
sphere.
is called a Wiener process with drift μ and infinitesimal variance σ2. These processes exhaust
continuous Lévy processes.
Two random processes on the time interval [0, 1] appear, roughly speaking, when conditioning
the Wiener process to vanish on both ends of [0,1]. With no further conditioning, the process
takes both positive and negative values on [0, 1] and is called Brownian bridge. Conditioned also
to stay positive on (0, 1), the process is called Brownian excursion. In both cases a rigorous
It is a stochastic process which is used to model processes that can never take on negative values,
such as the value of stocks.
The stochastic process
process
The local time Lt(x) treated as a random function of x (while t is constant) is a random process
described by Ray-Knight theorems in terms of Bessel processes.
[edit]Brownian martingales
Let A be an event related to the Wiener process (more formally: a set, measurable with respect to
the Wiener measure, in the space of functions), and Xt the conditional probability of A given the
Wiener process on the time interval [0, t] (more formally: the Wiener measure of the set of
trajectories whose concatenation with the given partial trajectory on [0, t] belongs to A). Then the
process Xt is a continuous martingale. Its martingale property follows immediately from the
definitions, but its continuity is a very special fact – a special case of a general theorem stating
that all Brownian martingales are continuous. A Brownian martingale is, by definition, a
martingale adapted to the Brownian filtration; and the Brownian filtration is, by definition, the
filtration generated by the Wiener process.
[edit]Time change
Every continuous martingale (starting at the origin) is a time changed Wiener process.
Example. 2Wt = V(4t) where V is another Wiener process (different from W but distributed like
W).
Especially, a nonnegative continuous martingale has a finite limit (as ) almost surely.
All stated (in this subsection) for martingales holds also for local martingales.
[edit]Change of measure
A wide class of continuous semimartingales (especially, of diffusion processes) is related to the
Wiener process via a combination of time change and change of measure.
Using this fact, the qualitative properties stated above for the Wiener process can be generalized
to a wide class of continuous semimartingales.
[edit]Complex-valued Wiener process
The complex-valued Wiener process may be defined as a complex-valued random process of the
form Zt = Xt + iYt where Xt,Yt are independent Wiener processes (real-valued).
[edit]Self-similarity
Brownian scaling, time reversal, time inversion: the same as in the real-valued case.
Rotation invariance: for every complex number c such that |c|=1 the process cZt is another
complex-valued Wiener process.
[edit]Time change
If f is an entire function then the process f(Zt) − f(0) is a time-changed complex-valued Wiener
process.
Implied volatility
In financial mathematics, the implied volatility of an option contract is the volatility implied by
the market price of the option based on an option pricing model. In other words, it is the
volatility that, when used in a particular pricing model, yields a theoretical value for the option
equal to the current market price of that option. Non-option financial instruments that have
embedded optionality, such as an interest rate cap, can also have an implied volatility. Implied
volatility, a forward-looking measure, differs from historical volatility because the latter is
calculated from known past returns of a security.
[edit] Motivation
An ordinary option pricing model, such as Black-Scholes, uses a variety of inputs to derive a
theoretical value for an option. Inputs to pricing models vary depending on the type of option
being priced and the pricing model used. However, in general, the value of an option depends on
an estimate of the future realized volatility, , of the underlying. Or, mathematically:
where is the theoretical value of an option, and is a pricing model that depends on , along
with other inputs.
The function f is monotonically increasing in , meaning that a higher value for volatility results
in a higher theoretical value of the option. Conversely, by the inverse function theorem, there can
be at most one value for that, when applied as an input to , will result in a particular
value for .
Put in other terms, assume that there is some inverse function , such that
where is the market price for an option. The value is the volatility implied by the market
A standard call option contract, , on 100 shares of non-dividend-paying XYZ Corp. The
option is struck at $50 and expires in 32 days. The risk-free interest rate is 5%. XYZ stock is
currently trading at $51.25 and the current market price of is $2.00. Using a standard
Black-Scholes pricing model, the volatility implied by the market price is 18.7%, or:
To verify, we apply the implied volatility back into the pricing model, and we generate a
theoretical value of $2.0004:
the option's theoretical value with respect to volatility, i.e. , which is also known as vega (see
The Greeks). If the pricing model function yields a closed-form solution for vega, which is the
case for Black-Scholes model, then Newton's method can be more efficient. However, for most
practical pricing models, such as a binomial model, this is not the case and vega must be derived
numerically. When forced to solve vega numerically, it usually turns out that Brent's method is
more efficient as a root-finding technique.
[edit] Implied volatility as measure of relative value
Often, the implied volatility of an option is a more useful measure of the option's relative value
than its price. This is because the price of an option depends most directly on the price of its
underlying security. If an option is held as part of a delta neutral portfolio, that is, a portfolio that
is hedged against small moves in the underlier's price, then the next most important factor in
determining the value of the option will be its implied volatility.
Implied volatility is so important that options are often quoted in terms of volatility rather than
price, particularly between professional traders.
[edit] Example
A call option is trading at $1.50 with the underlying trading at $42.05. The implied volatility of
the option is determined to be 18.0%. A short time later, the option is trading at $2.10 with the
underlying at $43.34, yielding an implied volatility of 17.2%. Even though the option's price is
higher at the second measurement, it is still considered cheaper on a volatility basis. This is
because the underlying needed to hedge the call option can be sold for a higher price.
[edit] Implied volatility as a price
Another way to look at implied volatility is to think of it as a price, not as a measure of future
stock moves. In this view it simply is a more convenient way to communicate option prices than
currency. Prices are different in nature from statistical quantities: We can estimate volatility of
future underlying returns using any of a large number of estimation methods, however the
number we get is not a price. A price requires two counterparts, a buyer and a seller. Prices are
determined by supply and demand. Statistical estimates depend on the time-series and the
mathematical structure of the model used. It is a mistake to confuse a price, which implies a
transaction, with the result of a statistical estimation which is merely what comes out of a
calculation. Implied volatilities are prices: They have been derived from actual transactions. Seen
in this light, it should not be surprising that implied volatilities might not conform to what a
particular statistical model would predict.
[edit] Non-constant implied volatility
In general, options based on the same underlier but with different strike value and expiration
times will yield different implied volatilities. This is generally viewed as evidence that an
underlier's volatility is not constant, but, instead depends on factors such as the price level of the
underlier, the underlier's recent variance, and the passage of time. See stochastic volatility and
volatility smile for more information.
[edit] Volatility instruments
Volatility instruments are financial instruments that track the value of implied volatility of other
derivative securities. For instance, the CBOE Volatility Index (VIX) is calculated from a
weighted average of implied volatilities of various options on the S&P 500 Index. There are also
other commonly referenced volatility indices such as the VXN index (Nasdaq 100 index futures
volatility measure), the QQV (QQQQ volatility measure), IVX - Implied Volatility Index (an
expected stock volatility over a future period for any of US securities and exchange traded
instruments), as well as options and futures derivatives based directly on these volatility indices
themselves.
Random walk
Example of eight random walks in one dimension starting at 0. The plot shows the
current position on the line (vertical axis) versus the time steps (horizontal axis).
An animated example of three Brownian motion-like random walks on a torus,
starting at the centre of the image.
. However, some walks take their steps at random times, and in that case the
position Xt is defined for the continuum of times . Specific cases or limits of random
walks include the drunkard's walk and Lévy flight. Random walks are related to the diffusion
models and are a fundamental topic in discussions of Markov processes. Several properties of
random walks, including dispersal distributions, first-passage times and encounter rates, have
been extensively studied.
[edit]Lattice random walk
A popular random walk model is that of a random walk on a regular lattice, where at each step
the walk jumps to another site according to some probability distribution. In simple random
walk, the walk can only jump to neighbouring sites of the lattice. In symmetric simple random
walk on a locally finite lattice, the probabilities of walk jumping to any one of its neighbours are
the same. The most well-studied example is of random walk on the d-dimensional integer lattice
A particularly elementary and concrete random walk is the random walk on the integers , which
starts at S0 = 0 and at each step moves by ±1 with equal probability. To define this walk formally,
take independent random variables , each of which is 1 with probability 1/2 and −1
with probability 1/2, and set This sequence {Sn} is called the simple random
walk on .
This walk can be illustrated as follows. Say you flip a fair coin. If it lands on heads, you move
one to the right on the number line. If it lands on tails, you move one to the left. So after five
flips, you have the possibility of landing on 1, −1, 3, −3, 5, or −5. You can land on 1 by flipping
three heads and two tails in any order. There are 10 possible ways of landing on 1. Similarly,
there are 10 ways of landing on −1 (by flipping three tails and two heads), 5 ways of landing on
3 (by flipping four heads and one tail), 5 ways of landing on −3 (by flipping four tails and one
head), 1 way of landing on 5 (by flipping five heads), and 1 way of landing on −5 (by flipping
five tails). See the figure below for an illustration of this example.
What can we say about the position Sn of the walk after n steps? Of course, it is random, so we
cannot calculate it. But we may say quite a bit about its distribution. It is not hard to see that the
expectation E(Sn) of Sn is zero. For example, this follows by the additivity property of
random variables Zn, shows that . This hints that E | Sn | , the expected translation
distance after n steps, should be of the order of √n. In fact,
Suppose we draw a line some distance from the origin of the walk. How many times will the
random walk cross the line if permitted to continue walking forever? The following, perhaps
surprising theorem is the answer: simple random walk on will cross every point an infinite
number of times. This result has many names: the level-crossing phenomenon, recurrence or the
gambler's ruin. The reason for the last name is as follows: if you are a gambler with a finite
amount of money playing a fair game against a bank with an infinite amount of money, you will
surely lose. The amount of money you have will perform a random walk, and it will almost
surely, at some time, reach 0 and the game will be over.
If a and b are positive integers, then the expected number of steps until a one-dimensional simple
random walk starting at 0 first hits b or −a is ab. The probability that this walk will hit b before
-a steps is a/(a + b), which can be derived from the fact that simple random walk is a martingale.
Some of the results mentioned above can be derived from properties of Pascal's triangle. The
number of different walks of n steps where each step is +1 or −1 is clearly 2n. For the simple
random walk, each of these walks are equally likely. In order for Sn to be equal to a number k it is
necessary and sufficient that the number of +1 in the walk exceeds those of −1 by k. Thus, the
number of walks which satisfy Sn = k is precisely the number of ways of choosing (n + k)/2
elements from an n element set (for this to be non-zero, it is necessary that n + k be an even
- - - - -
n 0 1 2345
5 4 3 2 1
P[S0 =
1
k]
2P[S1 =
1 1
k]
22P[S2
1 2 1
= k]
23P[S3
1 3 3 1
= k]
24P[S4
1 4 6 4 1
= k]
25P[S5 1 1
1 5 5 1
= k] 0 0
The central limit theorem and the law of the iterated logarithm describe important aspects of the
behavior of simple random walk on .
[edit]Gaussian random walk
A random walk having a step size that varies according to a normal distribution is used as a
model for real-world time series data such as financial markets. The Black-Scholes formula for
modeling equity option prices, for example, uses a gaussian random walk as an underlying
assumption.
Here, the step size is the inverse normal Φ − 1(z,μ,σ) where 0 ≤ z ≤ 1 is a uniformly distributed
random number, and μ and σ are the mean and standard deviations of the normal distribution,
respectively.
The root mean squared expected translation distance after n steps is
[edit]Higher dimensions
Random walk in two dimensions.
Random walk in two dimensions with more, and smaller, steps. In the limit, for very
small steps, one obtains the Brownian motion.
Imagine now a drunkard walking randomly in a city. The city is realistically infinite and
arranged in a square grid, and at every intersection, the drunkard chooses one of the four possible
routes (including the one he came from) with equal probability. Formally, this is a random walk
on the set of all points in the plane with integercoordinates. Will the drunkard ever get back to
his home from the bar? It turns out that he will. This is the high dimensional equivalent of the
level crossing problem discussed above. The probability of returning to the origin decreases as
the number of dimensions increases. In three dimensions, the probability decreases to roughly
34%. A derivation, along with values of p(d) are discussed here:
http://mathworld.wolfram.com/PolyasRandomWalkConstants.html.
The trajectory of a random walk is the collection of sites it visited, considered as a set with
disregard to when the walk arrived at the point. In one dimension, the trajectory is simply all
points between the minimum height the walk achieved and the maximum (both are, on average,
on the order of √n). In higher dimensions the set has interesting geometric properties. In fact, one
gets a discrete fractal, that is a set which exhibits stochastic self-similarity on large scales, but on
small scales one can observe "jaggedness" resulting from the grid on which the walk is
performed. The two books of Lawler referenced below are a good source on this topic.
A Wiener process is a stochastic process with similar behaviour to Brownian motion, the
physical phenomenon of a minute particle diffusing in a fluid. (Sometimes the Wiener process is
called "Brownian motion", although this is strictly speaking a confusion of a model with the
phenomenon being modeled.)
A Wiener process is the scaling limit of random walk in dimension 1. This means that if you take
a random walk with very small steps you get an approximation to a Wiener process (and, less
accurately, to Brownian motion). To be more precise, if the step size is ε, one needs to take a
walk of length L/ε2 to approximate a Wiener process walk of length L. As the step size tends to 0
(and the number of steps increases proportionally) random walk converges to a Wiener process
in an appropriate sense. Formally, if B is the space of all paths of length L with the maximum
topology, and if M is the space of measure over B with the norm topology, then the convergence
is in the space M. Similarly, a Wiener process in several dimensions is the scaling limit of
random walk in the same number of dimensions.
A random walk is a discrete fractal, but a Wiener process trajectory is a true fractal, and there is
a connection between the two. For example, take a random walk until it hits a circle of radius r
times the step length. The average number of steps it performs is r2. This fact is the discrete
version of the fact that a Wiener process walk is a fractal of Hausdorff dimension 2 [2]. In two
dimensions, the average number of points the same random walk has on the boundary of its
trajectory is r4/3. This corresponds to the fact that the boundary of the trajectory of a Wiener
process is a fractal of dimension 4/3, a fact predicted by Mandelbrot using simulations but
proved only in 2000 (Science, 2000).
A Wiener process enjoys many symmetries random walk does not. For example, a Wiener
process walk is invariant to rotations, but random walk is not, since the underlying grid is not
(random walk is invariant to rotations by 90 degrees, but Wiener processes are invariant to
rotations by, for example, 17 degrees too). This means that in many cases, problems on random
walk are easier to solve by translating them to a Wiener process, solving the problem there, and
then translating back. On the other hand, some problems are easier to solve with random walks
due to its discrete nature.
Random walk and Wiener process can be coupled, namely manifested on the same probability
space in a dependent way that forces them to be quite close. The simplest such coupling is the
Skorokhod embedding, but other, more precise couplings exist as well.
The convergence of a random walk toward the Wiener process is controlled by the central limit
theorem. For a particle in a known fixed position at t = 0, the theorem tells us that after a large
number of independent steps in the random walk, the walker's position is distributed according to
a normal distribution of total variance:
where t is the time elapsed since the start of the random walk, is the size of a step of the random
walk, and δt is the time elapsed between two successive steps.
This corresponds to the Green function of the diffusion equation that controls the Wiener
process, which demonstrates that, after a large number of steps, the random walk converges
toward a Wiener process.
In 3D, the variance corresponding to the Green's function of the diffusion equation is:
By equalizing this quantity with the variance associated to the position of the random walker,
one obtains the equivalent diffusion coefficient to be considered for the asymptotic Wiener
process toward which the random walk converges after a large number of steps:
Remark: the two expressions of the variance above correspond to the distribution associated to
the vector that links the two ends of the random walk, in 3D. The variance associated to each
component Rx, Ry or Rz is only one third of this value (still in 3D).
[edit]Applications
The following are the applications of random walk:
• In economics, the "random walk hypothesis" is used to model shares prices
and other factors. Empirical studies found some deviations from this
theoretical model, especially in short term and long term correlations. See
share prices.
• In population genetics, random walk describes the statistical properties of
genetic drift
• In physics, random walks are used as simplified models of physical Brownian
motion and the randommovement of molecules in liquids and gases. See for
example diffusion-limited aggregation. Also in physics, random walks and
some of the self interacting walks play a role in quantum field theory.
• In mathematical ecology, random walks are used to describe individual
animal movements, to empirically support processes of biodiffusion, and
occasionally to model population dynamics.
• In polymer physics, random walk describes an ideal chain. It is the simplest
model to study polymers.
• In other fields of mathematics, random walk is used to calculate solutions to
Laplace's equation, to estimate the harmonic measure, and for various
constructions in analysis and combinatorics.
• In computer science, random walks are used to estimate the size of the Web.
In the World Wide Web conference-2006, bar-yossef et al. published their
findings and algorithms for the same. (This was awarded the best paper for
the year 2006).
In all these cases, random walk is often substituted for Brownian motion.
• In brain research, random walks and reinforced random walks are used to
model cascades of neuron firing in the brain.
• In vision science, fixational eye movements are well described by a random
walk.
• In psychology, random walks explain accurately the relation between the
time needed to make a decision and the probability that a certain decision
will be made. (Nosofsky, 1997)
• Random walk can be used to sample from a state space which is unknown or
very large, for example to pick a random page off the internet or, for
research of working conditions, a random worker in a given country.
• When this last approach is used in computer science it is known as Markov
Chain Monte Carlo or MCMC for short. Often, sampling from some
complicated state space also allows one to get a probabilistic estimate of the
space's size. The estimate of the permanent of a large matrix of zeros and
ones was the first major problem tackled using this approach.
• In wireless networking, random walk is used to model node movement.
• Motile bacteria engage in a biased random walk.
• Random walk is used to model gambling.
• In physics, random walks underlying the method of Fermi estimation.
• During World War II a random walk was used to model the distance that an
escaped prisoner of war would travel in a given time.
[edit]Probabilistic interpretation
A one-dimensional random walk can also be looked at as a Markov chain whose state space is
Dimens Transi
R R2
ion ent
Nb
1 0 2 No
Nb
2 0 2 No
3 0 ... Yes
Dimens
R R2
ion
2 2Nb2
?
(3/2)N
3
? b2
Variance swap
A variance swap is an over-the-counterfinancial derivative that allows one to speculate on or
hedgerisks associated with the magnitude of movement, i.e. volatility, of some underlying
product, like an exchange rate, interest rate, or stock index.
One leg of the swap will pay an amount based upon the realised variance of the price changes of
the underlying product. Conventionally, these price changes will be daily logreturns, based upon
the most commonly used closing price. The other leg of the swap will pay a fixed amount, which
is the strike, quoted at the deal's inception. Thus the net payoff to the counterparties will be the
difference between these two and will be settled in cash at the expiration of the deal, though
some cash payments will likely be made along the way by one or the other counterparty to
maintain agreed upon margin.
[edit] Structure and features
The features of a variance swap include:
• the variance strike
• the realised variance
• the vega notional: Like other swaps, the payoff is determined based on a notional
amount that is never exchanged. However, in the case of a variance swap, the notional
amount is specified in terms of vega, to convert the payoff into dollar terms.
The payoff of a variance swap is given as follows:
where:
• Nvar = variance notional (a.k.a. variance units),
• = variance strike.[1]
The annualised realised variance is calculated based on a prespecified set of sampling points over
the period. It does not always coincide with the classic statistical definition of variance as the
contract terms may not subtract the mean. For example, suppose that there are n+1 sample points
S0,S1,...,Sn. Define, for i=1 to n, Ri = ln(Si / Si-1), the natural log returns. Then
•
where A is an annualisation factor normally chosen to be approximately the number of sampling
points in a year (commonly 252). It can be seen that subtracting the mean return will decrease the
realised variance. If this is done, it is common to use n − 1 as the divisor rather than n,
corresponding to an unbiased estimate of the sample variance.
It is market practice to determine the number of contract units as follows:
where Nvol is the corresponding vega notional for a volatility swap.[1] This makes the payoff of a
variance swap comparable to that of a volatility swap, another less popular instrument used to
trade volatility.
[edit] Pricing and valuation
The variance swap may be hedged and hence priced using a portfolio of European call and put
options with weights inversely proportional to the square of strike[2][3].
Any volatility smile model which prices vanilla options can therefore be used to price the
variance swap. For example, using the Heston model, a closed-form solution can be derived for
the fair variance swap rate. Care must be taken with the behaviour of the smile model in the
wings as this can have a disproportionate effect on the price.
We can derive the payoff of a variance swap using Ito's Lemma. We first assume that the
underlying stock is described as follows:
We can see that the total variance consists of a rebalanced hedge of and short a log contract.
A short log contract position is equal to being short a futures contract and a collection of puts and
calls:
Taking integrals and setting the value of the variance swap equal to zero, we can rearrange the
formula to solve for the fair variance swap strike:
Where:
S0 is the initial price of the underlying security
S* is the at the money price
K is the strike of the each option in the collection of options used
[edit] Uses
Many[who?] find variance swaps interesting or useful for their purity. An alternative way of
speculating on volatility is with an option, but if one only has interest in volatility risk, this
strategy will require constant delta hedging, so that direction risk of the underlying security is
approximately removed. What is more, a replicating portfolio of a variance swap would require
an entire strip of options, which would be very costly to execute. Finally, one might often find
the need to be regularly rolling this entire strip of options so that it remains centered around the
current price of the underlying security.
The advantage of variance swaps is that they provide pure exposure to the volatility of the
underlying price, as opposed to call and put options which may carry directional risk (delta). The
profit and loss from a variance swap depends directly on the difference between realized and
implied volatility.[4]
Another aspect that some speculators may find interesting is that the quoted strike is determined
by the implied volatility smile in the options market, whereas the ultimate payout will be based
upon actual realized variance. Historically, implied variance has been above realized variance[5],
a phenomenon known as the Variance risk premium, creating an opportunity for volatility
arbitrage, in this case known as the rolling short variance trade. For the same reason, these swaps
can be used to hedge Options on Realized Variance.
Volatility arbitrage
In finance, volatilityarbitrage (or vol arb) is a type of statistical arbitrage that is implemented by
trading a delta neutral portfolio of an option and its underlier. The objective is to take advantage
of differences between the implied volatility of the option, and a forecast of future realized
volatility of the option's underlier. In volatility arbitrage, volatility is used as the unit of relative
measure rather than price - that is, traders attempt to buy volatility when it is low and sell
volatility when it is high.[1][2]
[edit] Overview
To an option trader engaging in volatility arbitrage, an option contract is a way to speculate in
the volatility of the underlying rather than a directional bet on the underlier's price. If a trader
buys options as part of a delta-neutral portfolio, he is said to be long volatility. If he sells
options, he is said to be short volatility. So long as the trading is done delta-neutral, buying an
option is a bet that the underlier's future realized volatility will be high, while selling an option is
a bet that future realized volatility will be low. Because of put call parity, it doesn't matter if the
options traded are calls or puts. This is true because put-call parity posits a risk neutral
equivalence relationship between a call, a put and some amount of the underlier. Therefore,
being long a delta neutral call results in the same returns as being long a delta neutral put.
[edit] Forecast volatility
To engage in volatility arbitrage, a trader must first forecast the underlier's future realized
volatility. This is typically done by computing the historical daily returns for the underlier for a
given past sample such as 252 days, the number of trading days in a year. The trader may also
use other factors, such as whether the period was unusually volatile, or if there are going to be
unusual events in the near future, to adjust his forecast. For instance, if the current 252-day
volatility for the returns on a stock is computed to be 15%, but it is known that an important
patent dispute will likely be settled in the next year, the trader may decide that the appropriate
forecast volatility for the stock is 18%.
[edit] Market (Implied) Volatility
As described in option valuation techniques, there are a number of factors that are used to
determine the theoretical value of an option. However, in practice, the only two inputs to the
model that change during the day are the price of the underlier and the volatility. Therefore, the
theoretical price of an option can be expressed as:
where is the price of the underlier, and is the estimate of future volatility. Because the
corresponding monotonically increasing function that expresses the volatility implied by the
Or, in other words, when all other inputs including the stock price are held constant, there
exists no more than one implied volatility for each market price for the option.
Because implied volatility of an option can remain constant even as the underlier's value changes,
traders use it as a measure of relative value rather than the option's market price. For instance, if
a trader can buy an option whose implied volatility is 10%, it's common to say that the trader
can "buy the option for 10%". Conversely, if the trader can sell an option whose implied
volatility is 20%, it is said the trader can "sell the option at 20%".
For example, assume a call option is trading at $1.90 with the underlier's price at $45.50,
yielding an implied volatility of 17.5%. A short time later, the same option might trade at $2.50
with the underlier's price at $46.36, yielding an implied volatility of 16.8%. Even though the
option's price is higher at the second measurement, the option is still considered cheaper because
the implied volatility is lower. The reason this is true is because the trader can sell stock needed
to hedge the long call at a higher price.
[edit] Mechanism
Armed with a forecast volatility, and capable of measuring an option's market price in terms of
implied volatility, the trader is ready to begin a volatility arbitrage trade. A trader looks for
options where the implied volatility, is either significantly lower than or higher than the
forecast realized volatility , for the underlier. In the first case, the trader buys the option and
hedges with the underlier to make a delta neutral portfolio. In the second case, the trader sells the
option and then hedges them.
Over the holding period, the trader will realize a profit on the trade if the underlier's realized
volatility is closer to his forecast than it is to the market's forecast (i.e. the implied volatility).
The profit is extracted from the trade through the continual re-hedging required to keep the
portfolio delta neutral.
[edit]
Black–Scholes
From Wikipedia, the free encyclopedia
C(S,t) the price of a European call and P(S,t) the price of a European put
option.
σ, the volatility of the stock; this is the square root of the quadratic variation
of the stock's log price process.
[edit]Mathematical model
Simulated Geometric Brownian Motions with Parameters from Market Data
As per the model assumptions above, we assume that the underlying asset (typically the stock)
follows a geometric Brownian motion. That is,
where W is Brownian—the dW term here stands in for any and all sources of uncertainty in the
price history of a stock.
The payoff of an option V(S,T) at maturity is known. To find its value at an earlier time we
need to know how V evolves as a function of S and T. By Itō's lemma for two variables we have
Now consider a trading strategy under which one holds a single option and continuously trades
in the stock in order to hold shares. At time t, the value of these holdings will be
The composition of this portfolio, called the delta-hedge portfolio, will vary from time-step to
time-step. Let R denote the accumulated profit or loss from following this strategy. Then over the
time period [t, t + dt], the instantaneous profit or loss is
If we now substitute in for Π and identify left and right hand side of the equation we obtain the
Black–Scholes partial differential equation (PDE):
With the assumptions of the Black–Scholes model, this partial differential equation holds
whenever V is twice differentiable with respect to S and once with respect to t.
[edit]Other derivations
See also: Martingale pricing
Above we used the method of arbitrage-free pricing ("delta-hedging") to derive some PDE
governing option prices given the Black–Scholes model. It is also possible to use a risk-
neutrality argument. This latter method gives the price as the expectation of the option payoff
under a particular probability measure, called the risk-neutral measure, which differs from the
real world measure.
[edit]Black–Scholes formula
of the PDE gives the value of the option at any earlier time, . In order to
solve the PDE we transform the equation into a diffusion equation which may be solved using
standard methods. To this end we introduce the change-of-variable transformation
The terminal condition C(S,T) = max(S − K,0) now becomes an initial condition
where
and
Substituting for u, x, and τ, we obtain the value of a call option in terms of the Black–Scholes
parameters:
where
The price of a put option may be computed from this by put-call parity and simplifies to
[edit]Greeks
The Greeks under Black–Scholes are given in closed form, below:
Wha
Calls Puts
t
delt
a
gam
ma
vega
thet
a
rho
Note that the gamma and vega formulas are the same for calls and puts. This can be seen directly
from put-call parity.
In practice, some sensitivities are usually quoted in scaled-down terms, to match the scale of
likely changes in the parameters. For example, rho is often reported divided by 10,000 (1bp rate
change), vega by 100 (1 vol point change), and theta by 365 or 252 (1 day decay based on either
calendar days or trading days per year).
[edit]Extensions of the model
The above model can be extended for variable (but deterministic) rates and volatilities. The
model may also be used to value European options on instruments paying dividends. In this case,
closed-form solutions are available if the dividend is a known proportion of the stock price.
American options and options on stocks paying a known cash dividend (in the short term, more
realistic than a proportional dividend) are more difficult to value, and a choice of solution
techniques is available (for example lattices and grids).
[edit]Instruments paying continuous yield dividends
For options on indexes, it is reasonable to make the simplifying assumption that dividends are
paid continuously, and that the dividend amount is proportional to the level of the index.
The dividend payment paid over the time period [t, t + dt] is then modelled as
where now
is the modified forward price that occurs in the terms d1 and d2:
Exactly the same formula is used to price options on foreign exchange rates, except that now q
plays the role of the foreign risk-free interest rate and S is the spot exchange rate. This is the
Garman-Kohlhagen model (1983).
[edit]Instruments paying discrete proportional dividends
It is also possible to extend the Black–Scholes framework to options on instruments paying
discrete proportional dividends. This is useful when the option is struck on a single stock.
A typical model is to assume that a proportion δ of the stock price is paid out at pre-determined
times t1, t2, .... The price of the stock is then modelled as
where n(t) is the number of dividends that have been paid by time t.
The price of a call option on such a stock is again
where now
The Black–Scholes model disagrees with reality in a number of ways, some significant. It is
widely employed as a useful approximation, but proper application requires understanding its
limitations – blindly following the model exposes the user to unexpected risk.
Among the most significant limitations are:
• the underestimation of extreme moves, yielding tail risk, which can be
hedged with out-of-the-money options;
• the assumption of instant, cost-less trading, yielding liquidity risk, which is
difficult to hedge;
• the assumption of a stationary process, yielding volatility risk, which can be
hedged with volatility hedging;
• the assumption of continuous time and continuous trading, yielding gap risk,
which can be hedged with Gamma hedging.
In short, while in the Black–Scholes model one can perfectly hedge options by simply Delta
hedging, in practice there are many other sources of risk.
Results using the Black–Scholes model differ from real world prices due to simplifying
assumptions of the model. One significant limitation is that in reality security prices do not
follow a strict stationary log-normal process, nor is the risk-free interest actually known (and is
not constant over time). The variance has been observed to be non-constant leading to models
such as GARCH to model volatility changes. Pricing discrepancies between empirical and the
Black–Scholes model have long been observed in options that are far out-of-the-money,
corresponding to extreme price changes; such events would be very rare if returns were
lognormally distributed, but are observed much more often in practice.
Nevertheless, Black–Scholes pricing is widely used in practice,[4] for it is easy to calculate and
explicitly models the relationship of all the variables. It is a useful approximation, particularly
when analyzing the directionality that prices move when crossing critical points. It is used both
as a quoting convention and a basis for more refined models. Although volatility is not constant,
results from the model are often useful in practice and helpful in setting up hedges in the correct
proportions to minimize risk. Even when the results are not completely accurate, they serve as a
first approximation to which adjustments can be made.
One reason for the popularity of the Black–Scholes model is that it is robust in that it can be
adjusted to deal with some of its failures. Rather than considering some parameters (such as
volatility or interest rates) as constant, one considers them as variables, and thus added sources
of risk. This is reflected in the Greeks (the change in option value for a change in these
parameters, or equivalently the partial derivatives with respect to these variables), and hedging
these Greeks mitigates the risk caused by the non-constant nature of these parameters. Other
defects cannot be mitigated by modifying the model, however, notably tail risk and liquidity risk,
and these are instead managed outside the model, chiefly by minimizing these risks and by stress
testing.
Additionally, rather than assuming a volatility a priori and computing prices from it, one can use
the model to solve for volatility, which gives the implied volatility of an option at given prices,
durations and exercise prices. Solving for volatility over a given set of durations and strike prices
one can construct an implied volatility surface. In this application of the Black–Scholes model, a
coordinate transformation from the price domain to the volatility domain is obtained. Rather than
quoting option prices in terms of dollars per unit (which are hard to compare across strikes and
tenors), option prices can thus be quoted in terms of implied volatility, which leads to trading of
volatility in option markets.
[edit]The volatility smile
Main article: Volatility smile
One of the attractive features of the Black–Scholes model is that the parameters in the model
(other than the volatility) — the time to maturity, the strike, and the current underlying price —
are unequivocally observable. All other things being equal, an option's theoretical value is a
monotonic increasing function of implied volatility. By computing the implied volatility for
traded options with different strikes and maturities, the Black–Scholes model can be tested. If the
Black–Scholes model held, then the implied volatility for a particular stock would be the same
for all strikes and maturities. In practice, the volatility surface (the three-dimensional graph of
implied volatility against strike and maturity) is not flat. The typical shape of the implied
volatility curve for a given maturity depends on the underlying instrument. Equities tend to have
skewed curves: compared to at-the-money, implied volatility is substantially higher for low
strikes, and slightly lower for high strikes. Currencies tend to have more symmetrical curves,
with implied volatility lowest at-the-money, and higher volatilities in both wings. Commodities
often have the reverse behaviour to equities, with higher implied volatility for higher strikes.
Despite the existence of the volatility smile (and the violation of all the other assumptions of the
Black–Scholes model), the Black–Scholes PDE and Black–Scholes formula are still used
extensively in practice. A typical approach is to regard the volatility surface as a fact about the
market, and use an implied volatility from it in a Black–Scholes valuation model. This has been
described as using "the wrong number in the wrong formula to get the right price."[5] This
approach also gives usable values for the hedge ratios (the Greeks).
Even when more advanced models are used, traders prefer to think in terms of volatility as it
allows them to evaluate and compare options of different maturities, strikes, and so on.
[edit]Valuing bond options
Black–Scholes cannot be applied directly to bond securities because of pull-to-par. As the bond
reaches its maturity date, all of the prices involved with the bond become known, thereby
decreasing its volatility, and the simple Black–Scholes model does not reflect this process. A
large number of extensions to Black–Scholes, beginning with the Black model, have been used
to deal with this phenomenon.
[edit]Interest rate curve
In practice, interest rates are not constant - they vary by tenor, giving an interest rate curve which
may be interpolated to pick an appropriate rate to use in the Black–Scholes formula. Another
consideration is that interest rates vary over time. This volatility may make a significant
contribution to the price, especially of long-dated options.
[edit]Short stock rate
It is not free to take a short stock position. Similarly, it may be possible to lend out a long stock
position for a small fee. In either case, this can be treated as a continuous dividend for the
purposes of a Black–Scholes valuation.
[edit]Alternative formula derivation
Let S0 be the current price of the underlying stock and S the price when the option matures at
time T. Then S0 is known, but S is a random variable. Assume that
is a normal random variable with meanuT and varianceσ2T. It follows that the mean of S is
for some constant q (independent of T). Now a simple no-arbitrage argument shows that the
theoretical future value of a derivative paying one share of the stock at time T, and so with
payoff S, is
where r is the risk-free interest rate. This suggests making the identification q = r for the purpose
of pricing derivatives. Define the theoretical value of a derivative as the present value of the
expected payoff in this sense. For a call option with exercise price K this discounted expectation
(using risk-neutral probabilities) is
The derivation of the formula for C is facilitated by the following lemma: Let Z be a standard
normal random variable and let b be an extended real number. Define
where N is the standard normal cumulative distribution function. In the special case b = −∞, we
have
Now let
and use the corollary to the lemma to verify the statement above about the mean of S. Define
[citation needed]
[edit]Remarks on notation
The reader is warned of the inconsistent notation that appears in this article. Thus the
letter S is used as:
(1) a constant denoting the current price of the stock
(2) a real variable denoting the price at an arbitrary time
It is also used in the meaning of (4) with a subscript denoting time, but here the subscript is
merely a mnemonic.
In the partial derivatives, the letters in the numerators and denominators are, of course, real
variables, and the partial derivatives themselves are, initially, real functions of real variables. But
after the substitution of a stochastic process for one of the arguments they become stochastic
processes.
The Black–Scholes PDE is, initially, a statement about the stochastic process S, but when S is
reinterpreted as a real variable, it becomes an ordinary PDE. It is only then that we can ask about
its solution.
The parameter u that appears in the discrete-dividend model and the elementary derivation is not
the same as the parameter μ that appears elsewhere in the article. For the relationship between
them see Geometric Brownian motion.
Poisson process
From Wikipedia, the free encyclopedia
(Redirected from Poisson Process)
Jump to: navigation, search
A Poisson process, named after the French mathematician Siméon-Denis Poisson (1781–1840),
is a stochastic process in which events occur continuously and independently of one another (the
word event used here is not an instance of the concept of event frequently used in probability
theory). Examples that are well-modeled as Poisson processes include the radioactive decay of
atoms, telephone calls arriving at a switchboard, page view requests to a website, and rainfall.
The Poisson process is a collection {N(t) : t ≥ 0} of random variables, where N(t) is the number
of events that have occurred up to time t (starting from time 0). The number of events between
time a and time b is given as N(b) − N(a) and has a Poisson distribution. Each realization of the
process {N(t)} is a non-negative integer-valued step function that is non-decreasing, but for
intuitive purposes it is usually easier to think of it as a point pattern on [0,∞) (the points in time
where the step function jumps, i.e. the points in time where an event occurs).
The Poisson process is a continuous-time process: its discrete-time counterpart is the Bernoulli
process. Poisson processes are also examples of continuous-time Markov processes. A Poisson
process is a pure-birth process, the simplest example of a birth-death process. By the
aforementioned interpretation as a random point pattern on [0, ∞) it is also a point process on the
real half-line.
[edit] Definition
The basic form of Poisson process, often referred to simply as "the Poisson process", is a
continuous-time counting process {N(t), t ≥ 0} that possesses the following properties:
• N(0) = 0
• Independent increments (the numbers of occurrences counted in disjoint intervals are
independent from each other)
• Stationary increments (the probability distribution of the number of occurrences counted
in any time interval only depends on the length of the interval)
• No counted occurrences are simultaneous.
Consequences of this definition include:
• The probability distribution of N(t) is a Poisson distribution.
• The probability distribution of the waiting time until the next occurrence is an
exponential distribution.
Other types of Poisson process are described below.
[edit] Types
[edit] Homogeneous
where N(t + τ) − N(t) is the number of events in time interval (t, t + τ].
Just as a Poisson random variable is characterized by its scalar parameter λ, a homogeneous
Poisson process is characterized by its rate parameter λ, which is the expected number of
"events" or "arrivals" that occur per unit time.
N(t) is a sample homogeneous Poisson process, not to be confused with a density or distribution
function.
[edit] Non-homogeneous
Main article: Non-homogeneous Poisson process
In general, the rate parameter may change over time; such a process is called a non-
homogeneous Poisson process or inhomogeneous Poisson process. In this case, the
generalized rate function is given as λ(t). Now the expected number of events between time a
and time b is
Thus, the number of arrivals in the time interval (a, b], given as N(b) − N(a), follows a Poisson
distribution with associated parameter λa,b
A homogeneous Poisson process may be viewed as a special case when λ(t) = λ, a constant rate.
[edit] Spatial
A further variation on the Poisson process, called the spatial Poisson process, introduces a spatial
dependence on the rate function and is given as where for some vector spaceV
(e.g. R2 or R3). For any set (e.g. a spatial region) with finite measure, the number of
events occurring inside this region can be modelled as a Poisson process with associated rate
function λS(t) such that
In the special case that this generalized rate function is a separable function of time and space,
we have:
(If this is not the case, λ(t) can be scaled appropriately.) Now, represents the spatial
probability density function of these random events in the following sense. The act of sampling
this spatial Poisson process is equivalent to sampling a Poisson process with rate function λ(t),
and associating with each event a random vector sampled from the probability density
function . A similar result can be shown for the general (non-separable) case.
[edit] Properties
In its most general form, the only two conditions for a counting process to be a Poisson process
are:
• Orderliness: which roughly means
which implies that arrivals don't occur simultaneously (but this is actually a
mathematically stronger statement).
• Memorylessness (also called evolution without after-effects): the number of arrivals
occurring in any bounded interval of time after time t is independent of the number of
arrivals occurring before time t.
These seemingly unrestrictive conditions actually impose a great deal of structure in the Poisson
process. In particular, they imply that the time between consecutive events (called interarrival
times) are independent random variables. For the homogeneous Poisson process, these inter-
arrival times are exponentially distributed with parameter λ (mean 1/λ).
Proof : Let τ1 be the first arrival time of the Poisson process. Its distribution satisfies
Also, the memorylessness property entails that the number of events in any time interval is
independent of the number of events in any other interval that is disjoint from it. This latter
property is known as the independent increments property of the Poisson process.
To illustrate the exponentially-distributed inter-arrival times property, consider a homogeneous
Poisson process N(t) with rate parameter λ, and let Tk be the time of the kth arrival, for k = 1, 2, 3,
... . Clearly the number of arrivals before some fixed time t is less than kif and only if the waiting
time until the kth arrival is more than t. In symbols, the event [N(t) < k] occurs if and only if the
event [Tk > t] occurs. Consequently the probabilities of these events are the same:
In particular, consider the waiting time until the first arrival. Clearly that time is more than tif
and only if the number of arrivals before time t is 0. Combining this latter property with the
above probability distribution for the number of homogeneous Poisson process events in a fixed
interval gives
Consequently, the waiting time until the first arrival T1 has an exponential distribution, and is
thus memoryless. One can similarly show that the other interarrival times Tk − Tk−1 share the
same distribution. Hence, they are independent, identically-distributed (i.i.d.) random variables
with parameter λ > 0; and expected value 1/λ. For example, if the average rate of arrivals is 5 per
minute, then the average waiting time between arrivals is 1/5 minute.
[edit] Examples
The following examples are well-modeled by the Poisson process:
• The arrival of "customers" in a queue.
• The number of raindrops falling over an area.
• The number of photons hitting a photodetector.
• The number of telephone calls arriving at a switchboard, or at an automatic phone-
switching system.
• The number of particles emitted via radioactive decay by an unstable substance, where
the rate decays as the substance stabilizes.
• The long-term behavior of the number of web page requests arriving at a server, except
for unusual circumstances such as coordinated denial of service attacks or flash crowds.
Such a model assumes homogeneity as well as weak stationarity.
.
2. Obtain the squares of the error and regress them on a constant and q lagged values:
In that case, the GARCH(p, q) model (where p is the order of the GARCH terms and q is the
Generally, when testing for heteroskedasticity in econometric models, the best test is the White
test. However, when dealing with time series data, this means to test for ARCH errors (as
described above) and GARCH errors (below).
Prior to GARCH there was EWMA which has now been superseded by GARCH, although some
people utilise both.
[edit] GARCH(p, q) model specification
The lag length p of a GARCH(p, q) process is established in three steps:
1. Estimate the best fitting AR(q) model
.
2. Compute and plot the autocorrelations of ε by
2
3. The asymptotic, that is for large samples, standard deviation of ρ(i) is .
Individual values that are larger than this indicate GARCH errors. To estimate the total
number of lags, use the Ljung-Box test until the value of the these are less than, say, 10%
significant. The Ljung-Box Q-statistic follows χ2 distribution with n degrees of freedom
.
For stock returns, parameter is usually estimated to be positive; in this case, it reflects the
leverage effect, signifying that negative returns increase future volatility by a larger amount than
positive returns of the same magnitude.[1][2]
This model shouldn't be confused with the NARCH model, together with the NGARCH
extension, introduced by Higgins and Bera in 1992.[clarification needed]
[edit] IGARCH
Integrated Generalized Autoregressive Conditional Heteroskedasticity IGARCH is a restricted
version of the GARCH model, where the sum of the persistent parameters sum up to one, and
therefore there is a unit root in the GARCH process. The condition for this is
.
[edit] EGARCH
The exponential general autoregressive conditional heteroskedastic (EGARCH) model by
Nelson (1991) is another form of the GARCH model. Formally, an EGARCH(p,q):
where g(Zt) = θZt + λ( | Zt | − E( | Zt | )), is the conditional variance, ω, β, α, θ and λ
are coefficients, and Zt is a standard normal variable.
[edit] QGARCH
The Quadratic GARCH (QGARCH) model by Sentana (1995) is used to model asymmetric
effects of positive and negative shocks.
[edit] GJR-GARCH
Similar to QGARCH, The Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model by
Glosten, Jagannathan and Runkle (1993) also models asymmetry in the GARCH process. The
suggestion is to model where zt is i.i.d., and
where It − 1 = 0 if , and It − 1 = 1 if .
, and if .
[edit] fGARCH
Hentschel's fGARCH model[3], also known as Family GARCH, is an omnibus model that nests
a variety of other popular symmetric and asymmetric GARCH models including APARCH, GJR,
AVGARCH, NGARCH, etc.
Rate of return
In finance, rate of return (ROR), also known as return on investment (ROI), rate of profit or
sometimes just return, is the ratio of money gained or lost (whether realized or unrealized) on an
investment relative to the amount of money invested. The amount of money gained or lost may
be referred to as interest, profit/loss, gain/loss, or net income/loss. The money invested may be
referred to as the asset, capital, principal, or the cost basis of the investment. ROI is usually
expressed as a percentage rather than a fraction.
[edit]Calculation
The initial value of an investment, Vi, does not always have a clearly defined monetary value,
but for purposes of measuring ROI, the initial value must be clearly stated along with the
rationale for this initial value. The final value of an investment, Vf, also does not always have a
clearly defined monetary value, but for purposes of measuring ROI, the final value must be
clearly stated along with the rationale for this final value.[citation needed]
The rate of return can be calculated over a single period, or expressed as an average over
multiple periods.
[edit]Single-period
[edit]Arithmetic return
The arithmetic return is:
rarith is sometimes referred to as the yield. See also: effective interest rate, effective annual rate
(EAR) or annual percentage yield (APY).
[edit]Logarithmic or continuously compounded return
The logarithmic return or continuously compounded return, also known as force of interest,
is defined as:
It is the reciprocal of the e-folding time.
[edit]Multiperiod average returns
[edit]Arithmetic average rate of return
The arithmetic average rate of return over n periods is defined as:
The geometric average rate of return calculated over n years is also known as the annualized
return.
[edit]Internal rate of return
Main article: Internal rate of return
The internal rate of return (IRR), also known as the dollar-weighted rate of return, is defined
as the value(s) of that satisfies the following equation:
where:
• NPV = net present value of the investment
• Ct = cashflow at time t
When the rate of return r is smaller than the IRR rate , the investment is profitable, i.e., NPV>
0. Otherwise, the investment is not profitable.
[edit]Comparisons between various rates of return
[edit]Arithmetic and logarithmic return
Arithmetic return,
-100% -50% 0% 50% 100%
rarith
Logarithmic return,
-69.31% 0% 40.55% 69.31%
rlog
Rate of Return 5% 5% 5% 5%
Geometric Average at
5% 5% 5% 5%
End of Year
Geometric Average at -
50% 9.5% 16%
End of Year 1.6%
$150. $120. $156. $93.6
Capital at End of Year
00 00 00 0
($6.4
Dollar Profit/(Loss)
0)
-
Compound Yield
1.6%
- -
Geometric Average at -
-95% 77.6 63.2
End of Year 42.7%
% %
($89.2
Dollar Profit/(Loss)
5)
-
Compound Yield
22.3%
Dollar
$100 $55 $60 $50
Return
• ROI values typically used for personal financial decisions include Annual
Rate of Return and Annualized Rate of Return. For nominal risk investments
such as savings accounts or Certificates of Deposit, the personal investor
considers the effects of reinvesting/compounding on increasing savings
balances over time. For investments in which capital is at risk, such as stock
shares, mutual fund shares and home purchases, the personal investor
considers the effects of price volatility and capital gain/loss on returns.
• Profitability ratios typically used by financial analysts to compare a
company’s profitability over time or compare profitability between
companies include Gross Profit Margin, Operating Profit Margin, ROI ratio,
Dividend yield, Net profit margin, Return on equity, and Return on assets.[2]
• During capital budgeting, companies compare the rates of return of different
projects to select which projects to pursue in order to generate maximum
return or wealth for the company's stockholders. Companies do so by
considering the average rate of return, payback period, net present value,
profitability index, and internal rate of return for various projects. [3]
• A return may be adjusted for taxes to give the after-tax rate of return. This is
done in geographical areas or historical times in which taxes consumed or
consume a significant portion of profits or income. The after-tax rate of return
is calculated by multiplying the rate of return by the tax rate, then
subtracting that percentage from the rate of return.
• A return of 5% taxed at 15% gives an after-tax return of 4.25%
0.05 x 0.15 = 0.0075
0.05 - 0.0075 = 0.0425 = 4.25%
Investors usually seek a higher rate of return on taxable investment returns than on non-taxable
investment returns.
• A return may be adjusted for inflation to better indicate its true value in
purchasing power. Any investment with a nominal rate of return less than the
annual inflation rate represents a loss of value, even though the nominal rate
of return might well be greater than 0%. When ROI is adjusted for inflation,
the resulting return is considered an increase or decrease in purchasing
power. If an ROI value is adjusted for inflation, it is stated explicitly, such as
“The return, adjusted for inflation, was 2%.”
• Many online poker tools include ROI in a player's tracked statistics, assisting
users in evaluating an opponent's profitability.
Quarterly ROI 1% 1% 1% 1%
The concept of 'income stream' may express this more clearly. At the beginning of the year, the
investor took $1,000 out of his pocket (or checking account) to invest in a CD at the bank. The
money was still his, but it was no longer available for buying groceries. The investment provided
a cash flow of $10.00, $10.10, $10.20 and $10.30. At the end of the year, the investor got
$1,040.60 back from the bank. $1,000 was return of capital.
Once interest is earned by an investor it becomes capital. Compound interest involves
reinvestment of capital; the interest earned during each quarter is reinvested. At the end of the
first quarter the investor had capital of $1,010.00, which then earned $10.10 during the second
quarter. The extra dime was interest on his additional $10 investment. The Annual Percentage
Yield or Future value for compound interest is higher than for simple interest because the interest
is reinvested as capital and earns interest. The yield on the above investment was 4.06%.
Bank accounts offer contractually guaranteed returns, so investors cannot lose their capital.
Investors/Depositors lend money to the bank, and the bank is obligated to give investors back
their capital plus all earned interest. Because investors are not risking losing their capital on a
bad investment, they earn a quite low rate of return. But their capital steadily increases.
[edit]Returns when capital is at risk
[edit]Capital
Example: Stock with low volatility and a regular
gains and losses
quarterly dividend, reinvested
Many investments
carry significant 1st 2nd 3rd 4th
risk that the End of:
Quarter Quarter Quarter Quarter
investor will lose
some or all of the
Dividend $1 $1.01 $1.02 $1.03
invested capital.
For example,
investments in Stock Price $98 $101 $102 $99
company stock
shares put capital Shares
0.010204 0.01 0.01 0.010404
at risk. The value Purchased
of a stock share
depends on what Total Shares
1.010204 1.020204 1.030204 1.040608
someone is willing Held
to pay for it at a
certain point in Investment
time. Unlike $99 $103.04 $105.08 $103.02
Value
capital invested in
a savings account,
Quarterly ROI -1% 4.08% 1.98% -1.96%
the capital value
(price) of a stock
share constantly changes. If the price is relatively stable, the stock is said to have “low
volatility.” If the price often changes a great deal, the stock has “high volatility.” All stock shares
have some volatility, and the change in price directly affects ROI for stock investments.
Stock returns are usually calculated for holding periods such as a month, a quarter or a year.
[edit]Reinvestment when capital is at risk: rate of return and yield
Yield is the compound rate of return that includes the effect of reinvesting interest or dividends.
To the right is an example of a stock investment of one share purchased at the beginning of
the year for $100.
• The quarterly dividend is reinvested at the quarter-end stock price.
• The number of shares purchased each quarter = ($ Dividend)/($ Stock Price).
• The final investment value of $103.02 is a 3.02% Yield on the initial
investment of $100. This is the compound yield, and this return can be
considered to be the return on the investment of $100.
To calculate the rate of return, the investor includes the reinvested dividends in the total
investment. The investor received a total of $4.06 in dividends over the year, all of which were
reinvested, so the investment amount increased by $4.06.
• Total Investment = Cost Basis = $100 + $4.06 = $104.06.
• Capital gain/loss = $103.02 - $104.06 = -$1.04 (a capital loss)
• ($4.06 dividends - $1.04 capital loss ) / $104.06 total investment = 2.9% ROI
The disadvantage of this ROI calculation is that it does not take into account the fact that not all
the money was invested during the entire year (the dividend reinvestments occurred throughout
the year). The advantages are: (1) it uses the cost basis of the investment, (2) it clearly shows
which gains are due to dividends and which gains/losses are due to capital gains/losses, and (3)
the actual dollar return of $3.02 is compared to the actual dollar investment of $104.06.
For U.S. income tax purposes, if the shares were sold at the end of the year, dividends would be
$4.06, cost basis of the investment would be $104.06, sale price would be $103.02, and the
capital loss would be $1.04.
Since all returns were reinvested, the ROI might also be calculated as a continuously
compounded return or logarithmic return. The effective continuously compounded rate of
return is the natural log of the final investment value divided by the initial investment value:
• Vi is the initial investment ($100)
• Vf is the final value ($103.02)
.
[edit]Mutual fund and investment company returns
Mutual funds, exchange-traded funds (ETFs), and other equitized investments (such as unit
investment trusts or UITs, insurance separate accounts and related variable products such as
variable universal life insurance policies and variable annuity contracts, and bank-sponsored
commingled funds, collective benefit funds or common trust funds) are essentially portfolios of
various investment securities such as stocks, bonds and money market instruments which are
equitized by selling shares or units to investors. Investors and other parties are interested to know
how the investment has performed over various periods of time.
Performance is usually quantified by a fund's total return. In the 1990s, many different fund
companies were advertising various total returns-- some cumulative, some averaged, some with
or without deduction of sales loads or commissions, etc. To level the playing field and help
investors compare performance returns of one fund to another, the U.S. Securities and Exchange
Commission (SEC) began requiring funds to compute and report total returns based upon a
standardized formula-- so called "SEC Standardized total return" which is the average annual
total return assuming reinvestment of dividends and distributions and deduction of sales loads or
charges. Funds may compute and advertise returns on other bases (so-called "non-standardized"
returns), so long as they also publish no less prominently the "standardized" return data.
Subsequent to this, apparently investors who'd sold their fund shares after a large increase in the
share price in the late 1990s and early 2000s were ignorant of how significant the impact of
income/capital gain taxes was on their fund "gross" returns. That is, they had little idea how
significant the difference could be between "gross" returns (returns before federal taxes) and
"net" returns (after-tax returns). In reaction to this apparent investor ignorance, and perhaps for
other reasons, the SEC made further rule-making to require mutual funds to publish in their
annual prospectus, among other things, total returns before and after the impact of U.S federal
individual income taxes. And further, the after-tax returns would include 1) returns on a
hypothetical taxable account after deducting taxes on dividends and capital gain distributions
received during the illustrated periods and 2) the impacts of the items in #1) as well as assuming
the entire investment shares were sold at the end of the period (realizing capital gain/loss on
liquidation of the shares). These after-tax returns would apply of course only to taxable accounts
and not to tax-deferred or retirement accounts such as IRAs.
Lastly, in more recent years, "personalized" investment returns have been demanded by
investors. In other words, investors are saying more or less the fund returns may not be what
their actual account returns are based upon the actual investment account transaction history.
This is because investments may have been made on various dates and additional purchases and
withdrawals may have occurred which vary in amount and date and thus are unique to the
particular account. More and more fund and brokerage firms have begun providing personalized
account returns on investor's account statements in response to this need.
With that out of the way, here's how basic earnings and gains/losses work on a mutual fund. The
fund records income for dividends and interest earned which typically increases the value of the
mutual fund shares, while expenses set aside have an offsetting impact to share value. When the
fund's investments increase in market value, so too does the value of the fund shares (or units)
owned by the investors. When investments increase (decrease) in market value, so too the fund
shares value increases (or decreases). When the fund sells investments at a profit, it turns or
reclassifies that paper profit or unrealized gain into an actual or realized gain. The sale has no
affect on the value of fund shares but it has reclassified a component of its value from one bucket
to another on the fund books-- which will have future impact to investors. At least annually, a
fund usually pays dividends from its net income (income less expenses) and net capital gains
realized out to shareholders as an IRS requirement. This way, the fund pays no taxes but rather
all the investors in taxable accounts do. Mutual fund share prices are typically valued each day
the stock or bond markets are open and typically the value of a share is the net asset value of the
fund shares investors own.
[edit]Total returns
This section addresses only total returns without the impact of U.S. federal individual income
and capital gains taxes.
Mutual funds report total returns assuming reinvestment of dividend and capital gain
distributions. That is, the dollar amounts distributed are used to purchase additional shares of the
funds as of the reinvestment/ex-dividend date. Reinvestment rates or factors are based on total
distributions (dividends plus capital gains) during each period.
• .
•
.
•
.
Total Return = ((Final Price x Last Reinvestment Factor) - Beginning Price) / Beginning Price
[edit]Average annual total return (geometric)
Average Annual Return (geometric) US mutual funds are to compute total return as proscribed
by the U.S. Securities and Exchange Commission (SEC) in instructions to form N-1A (the fund
prospectus) as the average annual compounded rates of return for 1-year, 5-year and 10-year
periods (or inception of the fund if shorter) as the "average annual total return" for each fund.
The following formula is used:[4]
P(1+T)n = ERV
Where:
P = a hypothetical initial investment of $1,000.
T = average annual total return.
n = number of years.
ERV = ending redeemable value of a hypothetical $1,000 payment made at the beginning of the
1-, 5-, or 10-year periods at the end of the 1-, 5-, or 10-year periods (or fractional portion).
=
[edit]Example
Example: Mutual Fund with low volatility and a regular annual dividend,
reinvested at year-end share price, initial share value $100
Dividend $5 $5 $5 $5 $5
Total Distribution $5 $5 $7 $5 $5
Stable distribution
It has been suggested that stability (probability) be merged into this article or
section. (Discuss)
Stable
exponent
called skewness
parameter parameter (note that skewness is
s: undefined)
scale
location
support: or a half-line if |
β|=1
usually not analytically expressible
pdf:
(see text)
usually not analytically expressible
cdf:
(see text)
mean: undefined when α ≤ 1, otherwise μ
usually not analytically expressible
median:
(see text). Equal to μ when β = 0
usually not analytically expressible.
mode:
Equal to μ when β = 0
infinite except when α = 2, when it is
variance:
2c2
undefined except when α = 2, when it
skewness:
is 0
kurtosis: undefined except when α = 2, when it
is 0
entropy: not analytically expressible (see text)
mgf: undefined
cf:
for
for
In probability theory, a random variable is said to be stable (or to have a stable distribution) if
it has the property that a linear combination of two independent copies of the variable has the
same distribution, up to location and scale parameters. The stable distribution family is also
sometimes referred to as the Lévy alpha-stable distribution.
The importance of stable probability distributions is that they are "attractors" for properly
normed sums of independent and identically-distributed (iid) random variables. The normal
distribution is one family of stable distributions. By the classical central limit theorem the
properly normed sum of a set of random variables, each with finite variance, will tend towards a
normal distribution as the number of variables increases. Without the finite variance assumption
the limit may be a stable distribution. Stable distributions that are non-normal are often called
stable Paretian distributions, after Vilfredo Pareto.
[edit] Definition
The stable distributions are defined by the following property:
Let X1 and X2 be independent copies of a random variableX. Random variable X is said to
be stable if for any constants a and b the random variable aX1 + bX2 has the same
distribution as cX + d with some constants c and d. The distribution is said to be strictly
stable if this holds with d = 0 (Nolan 2009).
Since the normal distribution, the Cauchy distribution, and the Lévy distribution all have the
above property, it follows that they are special cases of stable distributions.
Such distributions form a four-parameter family of continuous probability distributions
parametrized by location and scale parameters μ and c, respectively, and two shape parameters β
and α, roughly corresponding to measures of asymmetry and concentration, respectively (see the
figures).
Although the probability density function for a general stable distribution cannot be written
analytically, the general characteristic function can be. Any probability distribution is determined
A random variable X is called stable if its characteristic function is given by (Nolan 2009)(Voit
2003 § 5.4.3)
only when
[edit] Parameterizations
The above definition is only one of the parameterizations in use for stable distributions; it is the
most common but is not continuous in the parameters. For example, for the case α = 1 we could
replace Φ by: (Nolan 2009)
and by
This parameterization has the advantage that we may define a standard distribution using
and
The pdf for all α will then have the following standardization property:
[edit] Applications
Stable distributions owe their importance in both theory and practice to the generalization of the
Central Limit Theorem to random variables without second (and possibly first) order moments
and the accompanying self-similarity of the stable family. It was the seeming departure from
normality along with the demand for a self-similar model for financial data (i.e. the shape of the
distribution for yearly asset price changes should resemble that of the constituent daily or
monthly price changes) that led Benoît Mandelbrot to propose that cotton prices follow an alpha-
stable distribution with α equal to 1.7. Lévy distributions are frequently found in analysis of
critical behavior and financial data (Voit 2003 § 5.4.3).
They are also found in spectroscopy as a general expression for a quasistatically pressure-
broadened spectral line(Peach 1981 § 4.5).
[edit] Properties
• All stable distributions are infinitely divisible.
• With the exception of the normal distribution (α = 2), stable distributions are
leptokurtotic and heavy-tailed distributions.
• Closure under convolution
Stable distributions are closed under convolution for a fixed value of α. Since convolution is
equivalent to multiplication of the Fourier-transformed function, it follows that the product of
two stable characteristic functions with the same α will yield another such characteristic
function. The product of two stable characteristic functions is given by:
Since is not a function of the or variables it follows that these parameters for the
convolved function are given by:
In each case, it can be shown that the resulting parameters lie within the required intervals for a
stable distribution.
[edit] Other definitions of stability
Below we give frequently used equivalent definitions of stability (Nolan 2009),(Voit 2003 §
5.4.3).
A random variable X is called stable if for n independent copies Xi of X there exists a constant d
such that
(equality of distributions).
[edit] The distribution
A stable distribution is therefore specified by the above four parameters. It can be shown that any
stable distribution has continuous (even smooth) density function. If f(x,α,β,c,μ) denotes the
The asymptotic behavior is described, for α < 2, by: (Nolan, Theorem 1.12)
where Γ is the Gamma function (except that when α < 1 and β = 1 or −1, the tail vanishes to the
left or right, resp., of μ). This "heavy tail" behavior causes the variance of Lévy distributions to
be infinite for all α < 2. This property is illustrated in the log-log plots below.
When α=2, the distribution is Gaussian (see below), with tails asymptotic to exp(−x2/4c2)/(2c√π).
[edit] Special cases
Log-log plot of symmetric centered stable distribution PDF's showing the power law behavior
for large x. The power law behavior is evidenced by the straight-line appearance of the PDF for
large x, with the slope equal to -(α+1). (The only exception is for α = 2, in black, which is a
normal distribution.)
Log-log plot of skewed centered stable distribution PDF's showing the power law behavior for
large x. Again the slope of the linear portions is equal to -(α+1)
There is no general analytic solution for the form of p(x). There are, however three special cases
which can be analytically expressed as can be seen by inspection of the characteristic function.
• For α = 2 the distribution reduces to a Gaussian distribution with variance σ2 = 2c2 and
mean μ and the skewness parameter β has no effect (Nolan 2009)(Voit 2003 § 5.4.3).
• For α = 1 and β = 0 the distribution reduces to a Cauchy distribution with scale
parameter c and shift parameter μ(Voit 2003 § 5.4.3)(Nolan 2009).
• For α = 1 / 2 and β = 1 the distribution reduces to a Lévy distribution with scale
parameter c and shift parameter μ. (Peach 1981 § 4.5)(Nolan 2009)
Note that the above three distributions are also connected, in the following way: A standard
Cauchy random variable can be viewed as a mixture of Gaussian random variables (all with
mean zero), with the variance being drawn from a standard Lévy distribution. And in fact this is
a special case of a more general theorem which allows any symmetric alpha-stable distribution to
be viewed in this way (with the alpha parameter of the mixture distribution equal to twice the
alpha parameter of the mixing distribution—and the beta parameter of the mixing distribution
always equal to unity).
Other special cases are:
• In the limit as c approaches zero or as α approaches zero the distribution will approach a
Dirac delta functionδ(x − μ).
• For α = 1 and β = 1 ,the distribution is a Landau distribution which has a specific usage
in physics under this name.
[edit] The generalized central limit theorem
Another important property of stable distributions is the role that they play in a generalized
central limit theorem. The central limit theorem states that the sum of a number of random
variables with finite variances will tend to a normal distribution as the number of variables
grows. A generalization due to Gnedenko and Kolmogorov states that the sum of a number of
random variables with power-law tail distributions decreasing as 1 / | x | α − 1 where 1 < α < 2
(and therefore having infinite variance) will tend to a stable distribution f(x;α,0,c,0) as the
number of variables grows. (Voit 2003 § 5.4.3)
[edit] Series representation
The stable distribution can be restated as the real part of a simpler integral:(Peach 1981 § 4.5)
where q = cα(1 − iβΦ). Reversing the order of integration and summation, and carrying out
the integration yields:
which will be valid for and will converge for appropriate values of the parameters. (Note
that the n=0 term which yields a delta function in x − μ has therefore been dropped.) Expressing
the first exponential as a series will yield another series in positive powers of x − μ which is
generally less useful.
Absolute deviation
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In statistics, the absolute deviation of an element of a data set is the absolutedifference between
that element and a given point. Typically the point from which the deviation is measured is a
measure of central tendency, most often the median or sometimes the mean of the data set.
Di = | xi − m(X) |
where
Di is the absolute deviation,
[edit]Measures of dispersion
Several measures of statistical dispersion are defined in terms of the absolute deviation.
[edit]Average absolute deviation
The average absolute deviation, or simply average deviation of a data set is the average of the
absolute deviations and is a summary statistic of statistical dispersion or variability. It is also
called the mean absolute deviation, but this is easily confused with the median absolute
deviation.
The average absolute deviation of a set {x1, x2, ..., xn} is
The choice of measure of central tendency, m(X), has a marked effect on the value of the
average deviation. For example, for the data set {2, 2, 3, 4, 14}:
Measure of central
Average absolute deviation
tendency m(X)
Mean = 5
Median = 3
Mode = 2
The average absolute deviation from the median is less than or equal to the average absolute
deviation from the mean. In fact, the average absolute deviation from the median is always less
than or equal to the average absolute deviation from any other fixed number.
The average absolute deviation from the mean is less than or equal to the standard deviation; one
way of proving this relies on Jensen's inequality.
If x is a Gaussian random variable with a mean of 0, then, in expectation for large n, the ratio of
standard deviation to mean absolute deviation should satisfy the following equality [1]
In other words, for a Gaussian, mean absolute deviation is about 0.8 times the standard deviation.
[edit]Mean absolute deviation
The mean absolute deviation (MAD) is the mean absolute deviation from the mean. A related
quantity, the mean absolute error (MAE), is a common measure of forecast error in time series
analysis, where this measures the average absolute deviation of observations from their forecasts.
It should be noted that although the term mean deviation is used as a synonym for mean
absolute deviation, to be precise it is not the same; in its strict interpretation (namely, omitting
the absolute value operation), the mean deviation of any data set from its mean is always zero.
[edit]Median absolute deviation
Main article: Median absolute deviation
The median absolute deviation (also MAD) is the median absolute deviation from the median. It
is a robust estimator of dispersion.
For the example {2, 2, 3, 4, 14}: 3 is the median, so the absolute deviations from the median are
{1, 1, 0, 1, 11} (or reordered as {0, 1, 1, 1, 11}) with a median absolute deviation of 1, in this
case unaffected by the value of the outlier 14.
[edit]Maximum absolute deviation
The maximum absolute deviation about a point is the maximum of the absolute deviations of a
sample from that point. It is realized by the sample maximum or sample minimum and cannot be
less than half the range.
[edit]Minimization
The measures of statistical dispersion derived from absolute deviation characterize various
measures of central tendency as minimizing dispersion: The median is the measure of central
tendency most associated with the absolute deviation, in that
L2 norm statistics
L1 norm statistics
L∞ norm statistics
[edit]Estimation
This section requires expansion.
The mean absolute deviation of a sample is a biased estimator of the mean absolute deviation of
the population.
Normal distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about the univariate normal distribution. For normally distributed vectors, see
Multivariate normal distribution. For matrices, see Matrix normal distribution. For stochastic
processes, see Gaussian process.
notation:
x∈R if σ2> 0
support:
x = μ if σ2 = 0
pdf:
cdf:
mean: μ
median: μ
mode: μ
variance: σ2
skewness: 0
kurtosis: 0
entropy:
mgf:
cf:
Fisher
information:
In the middle of the 19th century Maxwell demonstrated that the normal distribution is not only a
convenient mathematical tool, but that it also appears in nature. He writes[4]: “The number of
particles whose velocity, resolved in a certain direction, lies between x and x+dx is
It was Pearson to first write the distribution in terms of the standard deviation σ as in modern
notation. Soon after this, in year 1915, Fisher has added the location parameter to the formula for
normal distribution, expressing it in the way it is written nowadays:
Since its introduction, the normal distribution has been known by many different names: the law
of error, the law of facility of errors, Laplace’s second law, Gaussian law, etc. Curiously, it has
never been known under the name of its inventor, de Moivre. The name “normal distribution”
was coined independently by Peirce, Galton and Lexis around 1875; the term was derived from
the fact that this distribution was seen as typical, common, normal. This name was popularized
in statistical community by Pearson around the turn of the 20th century.[5]
The term “standard normal” which denotes the normal distribution with zero mean and unit
variance came into general use around 1950s, appearing in the popular textbooks by P.G. Hoel
(1947) “Introduction to mathematical statistics” and A.M. Mood (1950) “Introduction to the
theory of statistics”.[6]
[edit] Definition
The simplest case of a normal distribution is known as the standard normal distribution,
described by the probability density function
The constant in this expression ensures that the total area under the curve ϕ(x) is equal to
one, [proof]
and ⁄2 in the exponent makes the “width” of the curve (measured as half of the distance
1
between the inflection points of the curve) also equal to one. It is traditional[7] in statistics to
denote this function with the Greek letter ϕ (phi), whereas density functions for all other
distributions are usually denoted with letters ƒ or p. The alternative glyph φ is also used quite
often, however within this article we reserve “φ” to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic function (just as an
exponential distribution results from exponentiating a linear function):
This yields the classic “bell curve” shape (provided that a< 0 so that the quadratic function is
concave). Notice that f(x) > 0 everywhere. One can adjust a to control the “width” of the bell,
then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control
the “height” of the bell. For f(x) to be a true probability density function over R, one must choose
Notice that for a standard normal distribution, μ = 0 and σ2 = 1. The last part of the equation
above shows that any other normal distribution can be regarded as a version of the standard
normal distribution that has been stretched horizontally by a factor σ and then translated
rightward by a distance μ. Thus, μ specifies the position of the bell curve’s central peak, and σ
specifies the “width” of the bell curve.
The parameter μ is at the same time the mean, the median and the mode of the normal
distribution. The parameter σ2 is called the variance; as for any real-valued random variable, it
describes how concentrated the distribution is around its mean. The square root of σ2 is called the
standard deviation and is the width of the density function.
Some authors[8] instead of σ2 use its reciprocal τ = σ−2, which is called the precision. This
parameterization has an advantage in numerical applications where σ2 is very close to zero and is
more convenient to work with in analysis as τ is a natural parameter of the normal distribution.
Another advantage of using this parameterization is in the study of conditional distributions in
multivariate normal case.
Normal distribution is denoted as N(μ, σ2). Commonly the letter N is written in calligraphic font
(typed as \mathcal{N} in LaTeX). Thus when a random variable X is distributed normally with
mean μ and variance σ2, we write
[edit] Characterization
In the previous section the normal distribution was defined by specifying its probability density
function. However there are other ways to characterize a probability distribution. They include:
the cumulative distribution function, the moments, the cumulants, the characteristic function, the
moment-generating function, etc.
[edit] Probability density function
The continuous probability density function of the normal distribution exists only when the
variance parameter σ2 is not equal to zero. Then it is given by the Gaussian function
This isn’t a function in a usual sense, but rather a generalized function: it is equal to infinity at x
= μ and zero elsewhere.
Properties:
• Function ƒ(x) is symmetric around x = μ, which is at the same time the mode, the median
and the mean of the distribution.
• The inflection points of the curve occur one standard deviation away from the mean (i.e.,
at x = μ − σ and x = μ + σ).
• The standard normal density ϕ(x) is an eigenfunction of the Fourier transform.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The derivative of ϕ(x) is ϕ′(x) = −x·ϕ(x), the second derivative is ϕ′′(x) = (x2 − 1)ϕ(x).
[edit] Cumulative distribution function
See also: Error function, Q-function, and Standard normal table
The cumulative distribution function (cdf) of a random variable X evaluated at a number x, is the
probability of the event that X is less than or equal to x. The cdf of the standard normal
distribution is denoted with the capital greek letter Φ (phi), and can be computed as an integral of
the probability density function:
This integral cannot be expressed in terms of standard functions, however with the use of a
special functionerf, called the error function, the standard normal cdf Φ(x) can be written as
The complement of the standard normal cdf, 1 − Φ(x), is often denoted Q(x), and is referred to as
the Q-function, especially in engineering texts.[9][10] This represents the tail probability of the
Gaussian distribution, that is the probability that a standard normal random variable X is greater
than the number x:
Other definitions of the Q-function, all of which are simple transformations of Φ, are also used
occasionally.[11]
The inverse of the standard normal cdf, called the quantile function or probit function, can be
expressed in terms of the inverse error function:
It is recommended to use letter z to denote the quantiles of the standard normal cdf, unless that
letter is already used for some other purpose.
The values Φ(x) may be approximated very accurately by a variety of methods, such as
numerical integration, Taylor series, asymptotic series and continued fractions. For large values
of x it is usually easier to work with the Q-function.
For a generic normal random variable with mean μ and variance σ2 > 0 the cdf will be equal to
For a normal distribution with zero variance, the cdf is the Heaviside function:
Properties:
• The standard normal cdf is symmetric around point (0, ½): Φ(−x) = 1 − Φ(x).
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x): Φ’(x) = ϕ(x).
• The antiderivative of Φ(x) is: ∫ Φ(x) dx = xΦ(x) + ϕ(x).
[edit] Characteristic function
The characteristic functionφX(t) of a random variable X is defined as the expected value of eitX,
where i is the imaginary unit, and t ∈ R is the argument of the characteristic function. Thus the
characteristic function is the Fourier transform of the density ϕ(x).
For the standard normal random variable, the characteristic function is
For a generic normal distribution with mean μ and variance σ2, the characteristic function is [12]
The cumulant generating function is the logarithm of the moment generating function:
Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.
[edit] Properties
1. The family of normal distributions is closed under linear transformations. That is, if X is
normally distributed with mean μ and variance σ2, then a linear transform aX + b (for
some real numbers a ≠ 0 and b) is also normally distributed:
Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and
standard deviations σ1, σ2, then their linear combination will also be normally distributed:
[proof]
2. The converse of (1) is also true: if X1 and X2 are independent and their sum X1 + X2 is
distributed normally, then both X1 and X2 must also be normal. This is known as the
Cramér’s theorem.
3. Normal distribution is infinitely divisible: for a normally distributed X with mean μ and
variance σ2 we can find n independent random variables {X1, …, Xn} each distributed
normally with means μ/n and variances σ2/n such that
4. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent N(μ, σ2)
random variables and a, b are arbitrary real numbers, then
where X3 is also N(μ, σ2). This relationship directly follows from property (1).
5. The Kullback–Leibler divergence between two normal distributions X1∼N(μ1, σ21 )and
X2∼N(μ2, σ22 )is given by:[13]
6. The Fisher information matrix for normal distribution is diagonal and takes form
, and natural statistics x and x2. The dual, expectation parameters for normal
distribution are η1 = μ and η2 = μ2 + σ2.
8. Of all probability distributions over the reals with mean μ and variance σ2, the normal
distribution N(μ, σ2) is the one with the maximum entropy.
9. The family of normal distributions forms a manifold with constant curvature −1. The
same family is flat with respect to the (±1)-connections ∇(e) and ∇(m).[14]
[edit] Standardizing normal random variables
As a consequence of property 1, it is possible to relate all normal random variables to the
standard normal. For example if X is normal with mean μ and variance σ2, then
has mean zero and unit variance, that is Z has the standard normal distribution. Conversely,
having a standard normal random variable Z we can always construct another normal random
variable with specific mean μ and variance σ2:
This “standardizing” transformation is convenient as it allows one to compute the pdf and
especially the cdf of a normal distribution having the table of pdf and cdf values for the standard
normal. They will be related via
[edit] Moments
The normal distribution has moments of all orders. That is, for a normally distributed X with
mean μ and variance σ2, the expectation E[|X|p] exists and is finite for all p such that Re[p] > −1.
Usually we are interested only in moments of integer orders: p = 1, 2, 3, ….
• Central moments are the moments of X around its mean μ. Thus, central moment of
order p is the expected value of (X − μ)p. Using standardization of normal distribution,
this expectation will be equal to σp·E[Zp], where Z is standard normal.
Here n!! denotes the double factorial, that is the product of every other number from n to
1; and 1{…} is the indicator function.
• Central absolute moments are the moments of |X − μ|. They coincide with regular
moments for all even orders, but are nonzero for all odd p’s.
• Raw moments and raw absolute moments are the moments of X and |X| respectively.
The formulas for these moments are much more complicated, and are given in terms of
confluent hypergeometric functions1F1 and U.
1 0
2
3 0 0
4 0
5 0 0
6 0
7 0 0
8 0
The theorem will hold even if the summands Xi are not iid, although some constraints on the
degree of dependence and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test
statistics, scores, and estimators encountered in practice contain sums of certain random
variables in them, even more estimators can be represented as sums of random variables through
the use of influence functions — all of these quantities are governed by the central limit theorem
and will have asymptotically normal distribution as a result.
Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pdf of a
binomial distribution with n = 48 and p = 1/4
Another practical consequence of the central limit theorem is that certain other distributions can
be approximated by the normal distribution, for example:
• The binomial distributionB(n, p) is approximately normal N(np, np(1 − p)) for large n and
for p not too close to zero or one.
• The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
• The chi-squared distributionχ2(k) is approximately normal N(k, 2k) for large ks.
• The Student’s t-distributiont(ν) is approximately normal N(0, 1) when ν is large.
Whether these approximations are sufficiently accurate depends on the purpose for which they
are needed, and the rate of convergence to the normal distribution. It is typically the case that
such approximations are less accurate in the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the
Berry–Esseen theorem.
[edit] Standard deviation and confidence intervals
Dark blue is less than one standard deviation from the mean. For the normal distribution, this
accounts for about 68% of the set (dark blue), while two standard deviations from the mean
(medium and dark blue) account for about 95%, and three standard deviations (light, medium,
and dark blue) account for about 99.7%.
About 68% of values drawn from a normal distribution are within one standard deviation σ > 0
away from the mean μ; about 95% of the values are within two standard deviations and about
99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule, or the
empirical rule, or the 3-sigma rule.
To be more precise, the area under the bell curve between μ − nσ and μ + nσ in terms of the
cumulative normal distribution function is given by
where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma
points are:
0.68268949213
1
7
0.95449973610
2
4
0.99730020393
3
7
0.99993665751
4
6
0.99999942669
5
7
0.99999999802
6
7
The next table gives the reverse relation of sigma multiples corresponding to a few often used
values for the area under the bell curve. These values are useful to determine (asymptotic)
confidence intervals of the specified levels based on normally distributed (or asymptotically
normal) estimators:
1.28155156554
0.80
5
1.64485362695
0.90
1
1.95996398454
0.95
0
2.32634787404
0.98
1
2.57582930354
0.99
9
2.80703376834
0.995
4
3.09023230616
0.998
8
3.29052673149
0.999
2
3.89059188641
0.9999
3
0.99999 4.41717341346
9
where the value on the left of the table is the proportion of values that will fall within a given
interval and n is a multiple of the standard deviation that specifies the width of the interval.
[edit] Related and derived distributions
• If X is distributed normally with mean μ and variance σ2, then
○ The exponent of X is distributed log-normally: eX ~ lnN(μ, σ2).
○ The absolute value of X has folded normal distribution: IXI ~ Nf (μ, σ2).
○ The square of X, scaled down by the variance σ2, has the non-central chi-square
distribution with 1 degree of freedom: X2/σ2 ~ χ21(μ). If μ = 0, the distribution is
called simply chi-square.
○ Variable X restricted to an interval [a, b] is called the truncated normal
distribution.
○ (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
• If X1 and X2 are two independent standard normal random variables, then
○ Their sum and difference is distributed normally with mean zero and variance
two: X1 ± X2∼N(0, 2).
○ Their product Z = X1 · X2 follows an (unnamed?) distribution with density function
[15]
where K0 is the modified Bessel function of the second kind. This distribution is
symmetric around zero, unbounded at z = 0, and has the characteristic
functionφZ(t) = (1 + t 2)−1/2.
○ Their ratio follows the standard Cauchy distribution: X1 ÷ X2∼ Cauchy(0, 1).
○ Their Euclidean norm has the Rayleigh distribution (also known as chi
distribution with 2 degrees of freedom):
• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their
Maximizing this function with respect to μ and σ2 yields the maximum likelihood estimates
Estimator is called the sample mean, as it is the arithmetic mean of the sample observations.
The estimator is similarly called the sample variance. Sometimes instead of another
estimator is considered, s2, which differs from the former by having (n − 1) instead of n in the
denominator (so called Bessel’s correction):
This quantity s2 is also called the sample variance, and its square root the sample standard
deviation. The difference between s2 and becomes negligibly small for large n’s.
These estimators have the following properties:
which implies that the standard error of is equal to , that is if one wishes to
decrease the standard error by a factor of 10, one must increase the number of samples by
a factor of 100. This fact is widely used in determining sample sizes for opinion polls and
number of trials in Monte Carlo simulation.
has Student’s t-distribution. This t-statistic is ancillary, and is used for testing the
hypothesis H0:μ = μ0 and in construction of confidence intervals.
• The 1−α confidence intervals for μ and σ2 are:
where q… denotes the quantile function. For large n it is possible to replace the quantiles
of t- and χ²-distributions with the normal quantiles. For example, the approximate 95%
confidence intervals will be given by
for the normal distribution with expected value 0 and variance t satisfies the diffusion equation:
If the mass-density at time t = 0 is given by a Dirac delta, which essentially means that all mass
is initially concentrated in a single point, then the mass-density function at time t will have the
form of the normal probability density function with variance linearly growing with t. This
connection is no coincidence: diffusion is due to Brownian motion which is mathematically
described by a Wiener process, and such a process at time t will also result in a normal
distribution with variance linearly growing with t.
More generally, if the initial mass-density is given by a function ϕ(x), then the mass-density at
time t will be given by the convolution of ϕ and a normal probability density function.
[edit] Generating values from normal distribution
For computer simulations, especially in applications of Monte-Carlo method, it is often useful to
generate values that have a normal distribution. All algorithms described here are concerned with
generating the standard normal, since a N(μ, σ2) can be generated as X = μ + σZ, where Z is
standard normal. The algorithms rely on the availability of a random number generator capable
of producing random values distributed uniformly.
• The most straightforward method is based on the probability integral transform property:
if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal
distribution. The drawback of this method is that it relies on calculation of the probit
function Φ−1, which cannot be done analytically. Some approximate methods are
described in Hart (1968) and in the erf article.
• A simple approximate approach that is easy to program is as follows: simply sum 12
uniform (0,1) deviates and subtract 6 — the resulting random variable will have
approximately standard normal distribution. In truth, the distribution will be Irwin–Hall,
which is a 12-section eleventh-order polynomial approximation to the normal
distribution. This random deviate will have a limited range of (−6, 6).[17]
• The Box–Muller method uses two independent random numbers U and V distributed
uniformly on (0,1]. Then two random variables X and Y
will both have the standard normal distribution, and be independent. This formulation
arises because for a bivariate normal random vector (X Y) the squared norm X2 + Y2 will
have the chi-square distribution with two degrees of freedom, which is an easily-
generated exponential random variable corresponding to the quantity −2ln(U) in these
equations; and the angle is distributed uniformly around the circle, chosen by the random
variable V.
• Marsaglia polar method is a modification of the Box–Muller method algorithm, which
does not require computation of functions sin() and cos(). In this method U and V are
drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is
greater or equal to one then the method starts over, otherwise two quantities
are returned. Again, X and Y here will be independent and standard normally distributed.
• The ziggurat algorithm (Marsaglia & Tsang 2000) is faster than the Box–Muller
transform and still exact. In about 97% of all cases it uses only two random numbers, one
random integer and one random uniform, one multiplication and an if-test. Only in 3% of
the cases where the combination of those two falls outside the “core of the ziggurat” a
kind of rejection sampling using logarithms, exponentials and more uniform random
numbers has to be employed.
• There is also some investigation into the connection between the fast Hadamard
transform and the normal distribution, since the transform employs just addition and
subtraction and by the central limit theorem random numbers from almost any
distribution will be transformed into the normal distribution. In this regard a series of
Hadamard transforms can be combined with random permutations to turn arbitrary data
sets into a normally-distributed data.
[edit] Numerical approximations of the normal cdf
The standard normal cdf is widely used in scientific and statistical computing. Different
approximations are used depending on the desired level of accuracy.
• Abramowitz & Stegun (1964) give the approximation for Φ(x) with the absolute error |
ε(x)| < 7.5·10−8 (algorithm 26.2.17):
for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is
comparatively slow calculation time (for example it takes over 300 iterations to calculate
the function with 16 digits of precision when x = 10).
• The GNU Scientific Library calculates values of the standard normal cdf using Hart’s
algorithms and approximations with Chebyshev polynomials.
??? For a more detailed discussion of how to calculate the normal distribution, see Knuth’s The
Art of Computer Programming, volume 2, section 3.4.1C.
Sir Francis Galton (Natural Inheritance, 1889) described the Central Limit Theorem as:
The actual term "central limit theorem" (in German: "zentraler Grenzwertsatz") was first used by
George Pólya in 1920 in the title of a paper.[2](Le Cam 1986) Pólya referred to the theorem as
"central" due to its importance in probability theory. According to Le Cam, the French school of
probability interprets the word central in the sense that "it describes the behaviour of the centre
of the distribution as opposed to its tails" (Le Cam 1986). The abstract of the paper On the
central limit theorem of calculus of probability and the problem of moments by Pólya in 1920
translates as follows.
A thorough account of the theorem's history, detailing Laplace's foundational work, as well as
Cauchy's, Bessel's and Poisson's contributions, is provided by Hald.[3] Two historic accounts, one
covering the development from Laplace to Cauchy, the second the contributions by von Mises,
Pólya, Lindeberg, Lévy, and Cramér during the 1920s, are given by Hans Fischer.[4] A period
around 1935 is described in (Le Cam 1986). See Bernstein (1945) for a historical discussion
focusing on the work of Pafnuty Chebyshev and his students Andrey Markov and Aleksandr
Lyapunov that led to the first proofs of the CLT in a general setting.
A curious footnote to the history of the Central Limit Theorem is that a proof of a result similar
to the 1922 Lindeberg CLT was the subject of Alan Turing's 1934 Fellowship Dissertation for
King's College at the University of Cambridge. Only after submitting the work did Turing learn
it had already been proved. Consequently, Turing's dissertation was never published.[5][6]
[edit] Classical central limit theorem
A distribution being "smoothed out" by summation, showing original density of distribution and
three subsequent summations; see Illustration of the central limit theorem for further details.
The central limit theorem is also known as the second fundamental theorem of probability.[citation
needed]
(The Law of large numbers is the first.)
Let X1, X2, X3, …, Xn be a sequence of n independent and identically distributed (iid) random
variables each having finite values of expectation µ and variance σ2> 0. The central limit theorem
states[citation needed] that as the sample size n increases the distribution of the sample average of these
random variables approaches the normal distribution with a mean µ and variance σ2/n
irrespective of the shape of the common distribution of the individual terms Xi.
For a more precise statement of the theorem, let Sn be the sum of the n random variables, given
by
then they will converge in distribution to the standard normal distributionN(0,1) as n approaches
infinity. N(0,1) is thus the asymptotic distribution of the Zn's. This is often written as
or
[edit] Proof
For a theorem of such fundamental importance to statistics and applied probability, the central
limit theorem has a remarkably simple proof using characteristic functions. It is similar to the
proof of a (weak) law of large numbers. For any random variable, Y, with zero mean and unit
variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,
where o (t2 ) is "little o notation" for some function of t that goes to zero more rapidly than t2.
Letting Yi be (Xi − μ)/σ, the standardized value of Xi, it is easy to see that the standardized mean
of the observations X1, X2, ..., Xn is
But this limit is just the characteristic function of a standard normal distribution N(0, 1), and the
central limit theorem follows from the Lévy continuity theorem, which confirms that the
convergence of characteristic functions implies convergence in distribution.
[edit] Convergence to the limit
The central limit theorem gives only an asymptotic distribution. As an approximation for a finite
number of observations, it provides a reasonable approximation only when close to the peak of
the normal distribution; it requires a very large number of observations to stretch into the tails.
If the third central moment E((X1 − μ)3) exists and is finite, then the above convergence is
uniform and the speed of convergence is at least on the order of 1/n1/2 (see Berry-Esséen
theorem).
The convergence to the normal distribution is monotonic, in the sense that the entropy of Zn
increases monotonically to that of the normal distribution, as proven in Artstein, Ball, Barthe and
Naor (2004).
The central limit theorem applies in particular to sums of independent and identically distributed
discrete random variables. A sum of discrete random variables is still a discrete random variable,
so that we are confronted with a sequence of discrete random variables whose cumulative
probability distribution function converges towards a cumulative probability distribution
function corresponding to a continuous variable (namely that of the normal distribution). This
means that if we build a histogram of the realisations of the sum of n independent identical
discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the
histogram converges toward a Gaussian curve as n approaches infinity. The binomial distribution
article details such an application of the central limit theorem in the simple case of a discrete
variable taking only two possible values.
[edit] Relation to the law of large numbers
The law of large numbers as well as the central limit theorem are partial solutions to a general
problem: "What is the limiting behavior of Sn as n approaches infinity?" In mathematical
analysis, asymptotic series are one of the most popular tools employed to approach such
questions.
Suppose we have an asymptotic expansion of ƒ(n):
Dividing both parts by φ1(n) and taking the limit will produce a1, the coefficient of the highest-
order term in the expansion, which represents the rate at which ƒ(n) changes in its leading term.
Informally, one can say: "ƒ(n) grows approximately as a1 φ(n)". Taking the difference between
ƒ(n) and its approximation and then dividing by the next term in the expansion, we arrive at a
more refined statement about ƒ(n):
Here one can say that the difference between the function and its approximation grows
approximately as a2 φ2(n). The idea is that dividing the function by appropriate normalizing
functions, and looking at the limiting behavior of the result, can tell us much about the limiting
behavior of the original function itself.
Informally, something along these lines is happening when the sum, Sn, of independent
identically distributed random variables, X1, ..., Xn, is studied in classical probability theory. If
each Xi has finite mean μ, then by the Law of Large Numbers, Sn/n → μ.[7] If in addition each Xi
has finite variance σ2, then by the Central Limit Theorem,
where ξ is distributed as N(0, σ2). This provides values of the first two constants in the informal
expansion
In the case where the Xi's do not have finite mean or variance, convergence of the shifted and
rescaled sum can also occur with different centering and scaling factors:
or informally
Distributions Ξ which can arise in this way are called stable.[8] Clearly, the normal distribution is
stable, but there are also other stable distributions, such as the Cauchy distribution, for which the
mean or variance are not defined. The scaling factor bn may be proportional to nc, for any c ≥ 1/2;
it may also be multiplied by a slowly varying function of n.[9][10]
The Law of the Iterated Logarithm tells us what is happening "in between" the Law of Large
Numbers and the Central Limit Theorem. Specifically it says that the normalizing function
If for someδ > 0, the expected values are finite for every and the
Lyapunov's condition
where 1{…} is the indicator function. Then the distribution of the standardized sum Zn converges
towards the standard normal distribution N(0,1).
[edit] Beyond the classical framework
Asymptotic normality, that is, convergence to the normal distribution after appropriate shift and
rescaling, is a phenomenon much more general than the classical framework treated above,
namely, sums of independent random variables (or vectors). New frameworks are revealed from
time to time; no single unifying framework is available for now.
[edit] Under weak dependence
A useful generalization of a sequence of independent, identically distributed random variables is
a mixing random process in discrete time; "mixing" means, roughly, that random variables
temporally far apart from one another are nearly independent. Several kinds of mixing are used
in ergodic theory and probability theory. See especially strong mixing (also called α-mixing)
• in probability as n tends to
infinity,
• for every as
n tends to infinity,
expectation
[edit] Convex bodies
Theorem (Klartag 2007, Theorem 1.2). There exists a sequence for which the following
holds. Let , and let random variables have a log-concavejoint densityf such
Theorem (Klartag 2008, Theorem 1). Let satisfy the assumptions of the previous
theorem, then
such that
A more general case is treated in (Klartag 2007, Theorem 1.1). The condition
• 0 ≤ ak< 2π.
Then
A a fixed n×n matrix such that and let Then the distribution of
that weakly in L2(Ω) and weakly in L1(Ω). Then there exist integers
A histogram plot of monthly accidental deaths in the US, between 1973 and 1978 exhibits
normality, due to the central limit theorem
There are a number of useful and interesting examples and applications arising from the central
limit theorem (Dinov, Christou & Sanchez 2008). See e.g. [1], presented as part of the SOCR
CLT Activity.
• The probability distribution for total distance covered in a random walk (biased or
unbiased) will tend toward a normal distribution.
• Flipping a large number of coins will result in a normal distribution for the total number
of heads (or equivalently total number of tails).
From another viewpoint, the central limit theorem explains the common appearance of the "Bell
Curve" in density estimates applied to real world data. In cases like electronic noise, examination
grades, and so on, we can often regard a single measured value as the weighted average of a
large number of small effects. Using generalisations of the central limit theorem, we can then see
that this would often (though not always) produce a final distribution that is approximately
normal.
In general, the more a measurement is like the sum of independent variables with equal influence
on the result, the more normality it exhibits. This justifies the common use of this distribution to
stand in for the effects of unobserved variables in models like the linear model.
[edit] Signal processing
Signals can be smoothed by applying a Gaussian filter, which is just the convolution of a signal
with an appropriately scaled Gaussian function. Due to the central limit theorem this smoothing
can be approximated by several filter steps that can be computed much faster, like the simple
moving average.
The central limit theorem implies that to achieve a Gaussian of varianceσ2n filters with windows
Cauchy distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with the Lorenz curve.
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (August 2009)
Cauchy–Lorentz
location (real)
parameters:
scale (real)
support:
pdf:
cdf:
median: x0
mode: x0
cf:
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a
continuousprobability distribution. As a probability distribution, it is known as the Cauchy
distribution, while among physicists, it is known as the Lorentz distribution, Lorentz(ian)
function, or Breit–Wigner distribution.
Its importance in physics is due to its being the solution to the differential equation describing
forced resonance.[citation needed] In mathematics, it is closely related to the Poisson kernel, which is
the fundamental solution for the Laplace equation in the upper half-plane. In spectroscopy, it is
the description of the shape of spectral lines which are subject to homogeneous broadening in
which all atoms interact in the same way with the frequency range contained in the line shape.
Many mechanisms cause homogeneous broadening, most notably collision broadening, and
Chantler–Alda radiation.[1]
[edit] Characterization
[edit] Probability density function
The Cauchy distribution has the probability density function
where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is
the scale parameter which specifies the half-width at half-maximum (HWHM). γ is also equal to
half the interquartile range. Cauchy himself exploited such a density function in 1827, with
infinitesimal scale parameter, in defining a Dirac delta function (see there).
The amplitude of the above Lorentzian function is given by
[edit] Properties
The Cauchy distribution is an example of a distribution which has no mean, variance or higher
moments defined. Its mode and median are well defined and are both equal to x0.
When U and V are two independent normally distributedrandom variables with expected value 0
and variance 1, then the ratio U/V has the standard Cauchy distribution.
If X1, ..., Xn are independent and identically distributed random variables, each with a standard
Cauchy distribution, then the sample mean (X1 + ... + Xn)/n has the same standard Cauchy
distribution (the sample median, which is not affected by extreme values, can be used as a
measure of central tendency). To see that this is true, compute the characteristic function of the
sample mean:
where is the sample mean. This example serves to show that the hypothesis of finite variance
in the central limit theorem cannot be dropped. It is also an example of a more generalized
version of the central limit theorem that is characteristic of all stable distributions, of which the
Cauchy distribution is a special case.
The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly
stable distribution.
The standard Cauchy distribution coincides with the Student's t-distribution with one degree of
freedom.
Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is
closed under linear transformations with real coefficients. In addition, the Cauchy distribution is
the only univariate distribution which is closed under linear fractional transformations with real
coefficients. In this connection, see also McCullagh's parametrization of the Cauchy
distributions.
[edit] Characteristic function
Let X denote a Cauchy distributed random variable. The characteristic function of the Cauchy
distribution is given by
which is just the Fourier transform of the probability density. It follows that the probability may
be expressed in terms of the characteristic function by:
If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the
Cauchy distribution, both the positive and negative terms of (2) are infinite. This means (2) is
undefined. Moreover, if (1) is construed as a Lebesgue integral, then (1) is also undefined, since
(1) is then defined simply as the difference (2) between positive and negative parts.
However, if (1) is construed as an improper integral rather than a Lebesgue integral, then (2) is
undefined, and (1) is not necessarily well-defined. We may take (1) to mean
and this is its Cauchy principal value, which is zero, but we could also take (1) to mean, for
example,
Although the sample values xi will be concentrated about the central value x0, the sample mean
will become increasingly variable as more samples are taken, due to the increased likelihood of
encountering sample points with a large absolute value. In fact, the distribution of the sample
mean will be equal to the distribution of the samples themselves. Similarly, calculating the
sample variance will result in values that grow larger as more samples are taken.
A more robust means of estimating the central value x0 and the scaling parameter γ are needed.
For example, a simple method is to take the median value of the sample as an estimator of x0 and
half the sample interquartile range as an estimator of γ. Other, more precise and robust methods
have been developed [2]
[edit] Related distributions
• The ratio of two independent standard normal random variables is a standard Cauchy
variable, a Cauchy(0,1). Thus the Cauchy distribution is a ratio distribution.
• The standard Cauchy(0,1) distribution arises as a special case of Student's t distribution
with one degree of freedom.
• Relation to stable distribution: if X ~ Stable(1,0,γ,μ), then X ~Cauchy(μ, γ).
Lévy distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For the more general family of Lévy alpha-stable distributions, of which this distribution is a
special case, see stable distribution.
Lévy (unshifted)
support:
pdf:
cdf:
mean: infinite
median:
mode:
variance: infinite
skewness: undefined
kurtosis: undefined
entropy:
γ is Euler gamma
mgf: undefined
cf:
In probability theory and statistics, the Lévy distribution, named after Paul Pierre Lévy, is a
continuous probability distribution for a non-negative random variable. In spectroscopy this
distribution, with frequency as the dependent variable, is known as a van der Waals profile.[note
1]
It is one of the few distributions that are stable and that have probability density functions that
are analytically expressible, the others being the normal distribution and the Cauchy distribution.
All three are special cases of the stable distributions, which does not generally have an
analytically expressible probability density.
[edit] Definition
The probability density function of the Lévy distribution over the domain is
where μ is the location parameter and c is the scale parameter. The cumulative distribution
function is
where erfc(z) is the complementary error function. The shift parameter μ has the effect of
shifting the curve to the right by an amount μ, and changing the support to the interval [μ, ).
Like all stable distributions, the Levy distribution has a standard form f(x;0,1) which has the
following property:
where y is defined as
Note that the characteristic function can also be written in the same form used for the stable
distribution with α = 1 / 2 and β = 1:
Assuming μ = 0, the nth moment of the unshifted Lévy distribution is formally defined by:
which diverges for all n > 0 so that the moments of the Lévy distribution do not exist. The
moment generating function is then formally defined by:
which diverges for t> 0 and is therefore not defined in an interval around zero, so that the
moment generating function is not defined per se. Like all stable distributions except the normal
distribution, the wing of the probability density function exhibits heavy tail behavior falling off
according to a power law:
This is illustrated in the diagram below, in which the probability density functions for various
values of c and μ = 0 are plotted on a log-log scale.
[edit] Applications
• The Lévy distribution is of interest to the financial modeling community due to its
empirical similarity to the returns of securities.
• It is claimed that fruit flies follow a form of the distribution to find food (Lévy flight).[1]
• The frequency of geomagnetic reversals appears to follow a Lévy distribution
• The time of hitting a single point (different from the starting point 0) by the Brownian
motion has the Lévy distribution.
• The length of the path followed by a photon in a turbid medium follows the Lévy
distribution. [2]
• The Lévy distribution has been used post 1987 crash by the Options Clearing Corporation
for setting margin requirements because its parameters are more robust to extreme events
than those of a normal distribution, and thus extreme events do not suddenly increase
margin requirements which may worsen a crisis.[3]
• The statistics of solar flares are described by a non-Gaussian distribution. The solar flare
statistics were shown to be describable by a Lévy distribution and it was assumed that
intermittent solar flares perturb the intrinsic fluctuations in Earth’s average temperature.
The end result of this perturbation is that the statistics of the temperature anomalies
inherit the statistical structure that was evident in the intermittency of the solar flare data.
[4]
Probability distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article relies largely or entirely upon a single source. Please help
improve this article by introducing appropriate citations of additional
sources. (November 2008)
In probability theory and statistics, a probability distribution identifies either the probability of
each value of an unidentified random variable (when the variable is discrete), or the probability
of the value falling within a particular interval (when the variable is continuous).[1] The
probability distribution describes the range of possible values that a random variable can attain
and the probability that the value of the random variable is within any (measurable) subset of that
range.
When the random variable takes values in the set of real numbers, the probability distribution is
completely described by the cumulative distribution function, whose value at each real x is the
probability that the random variable is smaller than or equal to x.
The concept of the probability distribution and the random variables which they describe
underlies the mathematical discipline of probability theory, and the science of statistics. There is
spread or variability in almost any value that can be measured in a population (e.g. height of
people, durability of a metal, etc.); almost all measurements are made with some intrinsic error;
in physics many processes are described probabilistically, from the kinetic properties of gases to
the quantum mechanical description of fundamental particles. For these and many other reasons,
simple numbers are often inadequate for describing a quantity, while probability distributions are
often more appropriate.
There are various probability distributions that show up in various different applications. One of
the more important ones is the normal distribution, which is also known as the Gaussian
distribution or the bell curve and approximates many different naturally occurring distributions.
The toss of a fair coin yields another familiar distribution, where the possible values are heads or
tails, each with probability 1/2.
[edit]Formal definition
In the measure-theoretic formalization of probability theory, a random variable is defined as a
measurable functionX from a probability space to its observation space .A
probability distribution is the pushforward measureX*P = PX−1 on .
In other words, a probability distribution is a probability measure over the observation space
instead of the underlying probability space.
[edit]Probability distributions of real-valued random variables
Because a probability distribution Pr on the real line is determined by the probability of a real-
valued random variable X being in a half-open interval (-∞, x], the probability distribution is
completely characterized by its cumulative distribution function:
A probability distribution is called discrete if its cumulative distribution function only increases
in jumps. More precisely, a probability distribution is discrete if there is a finite or countable set
whose probability is 1.
For many familiar discrete distributions, the set of possible values is topologically discrete in the
sense that all its points are isolated points. But, there are discrete distributions for which this
countable set is dense on the real line.
Discrete distributions are characterized by a probability mass function, p such that
function: a non-negative Lebesgue integrable function defined on the real numbers such that
Discrete distributions and some continuous distributions (like the Cantor distribution) do not
admit such a density.
[edit]Terminology
The support of a distribution is the smallest closed interval/set whose complement has
probability zero. It may be understood as the points or elements that are actual members of the
distribution.
A discrete random variable is a random variable whose probability distribution is discrete.
Similarly, a continuous random variable is a random variable whose probability distribution is
continuous.
[edit]Simulated sampling
Main article: Inverse transform sampling
If one is programming and one wishes to sample from a probability distribution (either discrete
or continuous), the following algorithm lets one do so. This algorithm assumes that one has
access to the inverse of the cumulative distribution (easy to calculate with a discrete distribution,
can be approximated for continuous distributions) and a computational primitive called
"random()" which returns an arbitrary-precision floating-point-value in the range of [0,1).
define function sampleFrom(cdfInverse (type="function")):
// input:
// cdfInverse(x) - the inverse of the CDF of the probability distribution
// example: if distribution is [[Gaussian]], one can use a [[Taylor
approximation]] of the inverse of [[erf]](x)
// example: if distribution is discrete, see explanation below
pseudocode
// output:
// type="real number" - a value sampled from the probability distribution
represented by cdfInverse
r = random()
return cdfInverse(r)
For discrete distributions, the function cdfInverse (inverse of cumulative distribution function)
can be calculated from samples as follows: for each element in the sample range (discrete values
along the x-axis), calculating the total samples before it. Normalize this new discrete distribution.
This new discrete distribution is the CDF, and can be turned into an object which acts like a
function: calling cdfInverse(query) returns the smallest x-value such that the CDF is greater than
or equal to the query.
define function dataToCdfInverse(discreteDistribution (type="dictionary"))
// input:
// discreteDistribution - a mapping from possible values to
frequencies/probabilities
// example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]]
with chance=p
// example: setting p=0.5 in the above example, this is a [[fair coin]]
where P(X=1)->"heads" and P(X=0)->"tails"
// output:
// type="function" - a function that represents (CDF^-1)(x)
Self-similarity
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Central tendency
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In statistics, the term central tendency relates to the way in which quantitative data tend to
cluster around some value.[1] A measure of central tendency is any of a number of ways of
specifying this "central value". In practical statistical analyses, the terms are often used before
one has chosen even a preliminary form of analysis: thus an initial objective might be to "choose
an appropriate measure of central tendency".
In the simplest cases, the measure of central tendency is an average of a set of measurements, the
word average being variously construed as mean, median, or other measure of location,
depending on the context. However, the term is applied to multidimensional data as well as to
univariate data and in situations where a transformation of the data values for some or all
dimensions would usually be considered necessary: in the latter cases, the notion of a "central
location" is retained in converting an "average" computed for the transformed data back to the
original units. In addition, there are several different kinds of calculations for central tendency,
where the kind of calculation depends on the type of data (level of measurement).
Both "central tendency" and "measure of central tendency" apply to either statistical populations
or to samples from a population.
[edit]Basic measures of central tendency
The following may be applied to individual dimensions of multidimensional data, after
transformation, although some of these involve their own implicit transformation of the data.
• Arithmetic mean - the sum of all measurements divided by the number of
observations in the data set
• Median - the middle value that separates the higher half from the lower half
of the data set
• Mode - the most frequent value in the data set
• Geometric mean - the nth root of the product of the data values
• Harmonic mean - the reciprocal of the arithmetic mean of the reciprocals of
the data values
• Weighted mean - an arithmetic mean that incorporates weighting to certain
data elements
• Truncated mean - the arithmetic mean of data values after a certain number
or proportion of the highest and lowers data values have been discarded.
• Midrange - the arithmetic mean of the maximum and minimum values of a
data set.
Geometric mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The geometric mean, in mathematics, is a type of mean or average, which indicates the central
tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what
most people think of with the word "average", except that instead of adding the set of numbers
and then dividing the sum by the count of numbers in the set, n, the numbers are multiplied and
then the nth root of the resulting product is taken.
For instance, the geometric mean of two numbers, say 2 and 8, is just the square root of their
product which equals 4; that is 2√2 × 8 = 4. As another example, the geometric mean of three
numbers 1, ½, ¼ is the cube root of their product (1/8), which is 1/2; that is 3√1 × ½ × ¼ =
½.
The geometric mean can also be understood in terms of geometry. The geometric mean of two
numbers, a and b, is the length of one side of a square whose area is equal to the area of a
rectangle with sides of lengths a and b. Similarly, the geometric mean of three numbers, a, b, and
c, is the length of one side of a cube whose volume is the same as that of a cuboid with sides
whose lengths are equal to the three given numbers.
The geometric mean only applies to positive numbers.[1] It is also often used for a set of numbers
whose values are meant to be multiplied together or are exponential in nature, such as data on the
growth of the human population or interest rates of a financial investment. The geometric mean
is also one of the three classic Pythagorean means, together with the aforementioned arithmetic
mean and the harmonic mean.
[edit]Calculation
The geometric mean of a data set [a1, a2, ..., an] is given by
The geometric mean of a data set is less than or equal to the data set's arithmetic mean (the two
means are equal if and only if all members of the data set are equal). This allows the definition of
the arithmetic-geometric mean, a mixture of the two which always lies in between.
The geometric mean is also the arithmetic-harmonic mean in the sense that if two sequences
(an) and (hn) are defined:
and
Replacing arithmetic and harmonic mean by a pair of generalized means of opposite, finite
exponents yields the same result.
[edit]Relationship with arithmetic mean of logarithms
By using logarithmic identities to transform the formula, we can express the multiplications as a
sum and the power as a multiplication.
This is sometimes called the log-average. It is simply computing the arithmetic mean of the
logarithm transformed values of ai (i.e., the arithmetic mean on the log scale) and then using the
exponentiation to return the computation to the original scale, i.e., it is the generalised f-mean
with f(x) = log x.
In simple words
for observations more than 2...
1) for normal frequency
Harmonic mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In mathematics, the harmonic mean (formerly sometimes called the subcontrary mean) is one
of several kinds of average. Typically, it is appropriate for situations when the average of rates is
desired.
The harmonic mean H of the positive real numbersx1, x2, ..., xn is defined to be
Equivalently, the harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
[edit]Relationship with other means
A geometric construction of the three Pythagorean means (of two numbers only).
Harmonic mean denoted by H in purple color.
The harmonic mean is one of the three Pythagorean means. For all data sets containing at least
one pair of nonequal values, the harmonic mean is always the least of the three means, while the
arithmetic mean is always the greatest of the three and the geometric mean is always in between.
(If all values in a nonempty dataset are equal, the three means are always equal to one another;
e.g. the harmonic, geometric, and arithmetic means of {2, 2, 2} are all 2.)
It is the special case M−1 of the power mean.
Since the harmonic mean of a list of numbers tends strongly toward the least elements of the list,
it tends (compared to the arithmetic mean) to mitigate the impact of large outliers and aggravate
the impact of small ones.
The arithmetic mean is often mistakenly used in places calling for the harmonic mean.[1] In the
speed example below for instance the arithmetic mean 50 is incorrect, and too big.
[edit]Weighted harmonic mean
If a set of weightsw1, ..., wn is associated to the dataset x1, ..., xn, the weighted harmonic mean
is defined by
The harmonic mean is the special case where all of the weights are equal to 1.
[edit]Examples
[edit]In physics
In certain situations, especially many situations involving rates and ratios, the harmonic mean
provides the truest average. For instance, if a vehicle travels a certain distance at a speed x (e.g.
60 kilometres per hour) and then the same distance again at a speed y (e.g. 40 kilometres per
hour), then its average speed is the harmonic mean of x and y (48 kilometres per hour), and its
total travel time is the same as if it had traveled the whole distance at that average speed.
However, if the vehicle travels for a certain amount of time at a speed x and then the same
amount of time at a speed y, then its average speed is the arithmetic mean of x and y, which in
the above example is 50 kilometres per hour. The same principle applies to more than two
segments: given a series of sub-trips at different speeds, if each sub-trip covers the same
distance, then the average speed is the harmonic mean of all the sub-trip speeds, and if each sub-
trip takes the same amount of time, then the average speed is the arithmetic mean of all the sub-
trip speeds. (If neither is the case, then a weighted harmonic mean or weighted arithmetic mean
is needed.)
Similarly, if one connects two electrical resistors in parallel, one having resistance x (e.g. 60Ω)
and one having resistance y (e.g. 40Ω), then the effect is the same as if one had used two
resistors with the same resistance, both equal to the harmonic mean of x and y (48Ω): the
equivalent resistance in either case is 24Ω (one-half of the harmonic mean). However, if one
connects the resistors in series, then the average resistance is the arithmetic mean of x and y (with
total resistance equal to the sum of x and y). And, as with previous example, the same principle
applies when more than two resistors are connected, provided that all are in parallel or all are in
series.
[edit]In other sciences
In Information retrieval and some other fields, the harmonic mean of the precision and the recall
is often used as an aggregated performance score: the F-score (or F-measure).
An interesting consequence arises from basic algebra in problems of working together. As an
example, if a gas-powered pump can drain a pool in 4 hours and a battery-powered pump can
drain the same pool in 6 hours, then it will take both pumps (6 · 4)/(6 + 4), which is equal to 2.4
hours, to drain the pool together. Interestingly, this is one-half of the harmonic mean of 6 and 4.
In hydrology the harmonic mean is used to average hydraulic conductivity values for flow that is
perpendicular to layers (e.g. geologic or soil). On the other hand, for flow parallel to layers the
arithmetic mean is used.
In sabermetrics, the Power-speed number of a player is the harmonic mean of his home run and
stolen base totals.
When considering fuel economy in automobiles two measures are commonly used - miles per
gallon (mpg), and litres per 100 km. As the dimensions of these quantities are the inverse of each
other (one is distance per volume, the other volume per distance) when taking the mean value of
the fuel-economy of a range of cars one measure will produce the harmonic mean of the other -
i.e. converting the mean value of fuel economy expressed in litres per 100 km to miles per gallon
will produce the harmonic mean of the fuel economy expressed in miles-per-gallon.
[edit]In finance
The harmonic mean is the preferable method for averaging multiples, such as the price/earning
ratio, in which price is in the numerator. If these ratios are averaged using an arithmetic mean (a
common error), high data points are given greater weights than low data points. The harmonic
mean, on the other hand, gives equal weight to each data point. See "Fairness Opinions:
Common Errors and Omissions" in The Handbook of Business Valuation and Intellectual
Property Analysis (McGraw Hill, 2004).
[edit]Harmonic mean of two numbers
For the special case of just two numbers x1 and x2, the harmonic mean can be written
In this special case, the harmonic mean is related to the arithmetic meanA = (x1 + x2) / 2 and
the geometric mean by
So , which means the geometric mean, for two numbers, is the geometric mean of
the arithmetic mean and the harmonic mean.
Weighted mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The weighted mean is similar to an arithmetic mean (the most common type of average), where
instead of each of the data points contributing equally to the final average, some data points
contribute more than others. The notion of weighted mean plays a role in descriptive statistics
and also occurs in a more general form in several other areas of mathematics.
If all the weights are equal, then the weighted mean is the same as the arithmetic mean. While
weighted means generally behave in a similar fashion to arithmetic means, they do have a few
counter-intuitive properties, as captured for instance in Simpson's paradox.
The term weighted average usually refers to a weighted arithmetic mean, but weighted versions
of other means can also be calculated, such as the weighted geometric mean and the weighted
harmonic mean.
[edit]Example
Given two school classes, one with 20 students, and one with 30 students, the grades in each
class on a test were:
Morning class = 62, 67, 71, 74, 76, 77, 78, 79, 79, 80, 80, 81, 81, 82, 83, 84,
86, 89, 93, 98
Afternoon class = 81, 82, 83, 84, 85, 86, 87, 87, 88, 88, 89, 89, 89, 90, 90,
90, 90, 91, 91, 91, 92, 92, 93, 93, 94, 95, 96, 97, 98, 99
The straight average for the morning class is 80 and the straight average of the afternoon class is
90. The straight average of 80 and 90 is 85, the mean of the two class means. However, this does
not account for the difference in number of students in each class, and the value of 85 does not
reflect the average student grade (independent of class). The average student grade can be
obtained by either averaging all the numbers without regard to classes, or weighting the class
means by the number of students in each class:
Or, using a weighted mean of the class means:
The weighted mean makes it possible to find the average student grade also in the case where
only the class means and the number of students in each class are available.
[edit]Mathematical definition
Formally, the weighted mean of a non-empty set of data
is the quantity
which means:
Therefore data elements with a high weight contribute more to the weighted mean than do
elements with a low weight. The weights cannot be negative. Some may be zero, but not all of
them (since division by zero is not allowed).
The formulas are simplified when the weights are normalized such that they sum up to 1, i.e.
The common mean is a special case of the weighted mean where all data have equal
weights, wi = w.
[edit]Length-weighted mean
This is used for weighting a response variable based upon its dependency on x, a distance
variable.
[edit]Convex combination
Since only the relative weights are relevant, any weighted mean can be expressed using
coefficients that sum to one. Such a linear combination is called a convex combination.
Using the previous example, we would get the following:
[edit]Statistical properties
The weighted sample mean with normalized weights is itself a random variable. Its expected
value and standard deviation are related to the expected values and standard deviations of the
observations as follows.
If the observations have expected values , then the weighted sample mean has
, then the expectation of the weighted sample mean will be the same, .
For uncorrelated observations with standard deviations σi, the weighted sample mean has
the former case we have , which is related to the central limit theorem.
[edit]Dealing with variance
For the weighted mean of a list of data for which each element comes from a different
probability distribution with known variance , one possible choice for the weights is given by:
For small samples, it is customary to use an unbiased estimator for the population variance. In
normal unweighted samples, the N in the denominator (corresponding to the sample size) is
changed to N − 1. While this is simple in unweighted samples, it is not straightforward when the
sample is weighted. The unbiased estimator of a weighted population variance is given by [1]:
In the general case, suppose that , is the covariance matrix relating the
quantities xi, is the common mean to be estimated, and is the design matrix [1, ..., 1] (of
length n). The Gauss–Markov theorem states that the estimate of the mean having minimum
variance is given by:
and
3.34 -
Strong
5.00
1.67 -
Satisfactory
3.33
0.00 -
Weak
1.66
area 1 − e− 1 = 0.61. The tail area at step n is . Where primarily the closest n
observations matter and the effect of the remaining observations can be ignored safely, then
choose w such that the tail area is sufficiently small.
Volatility smile
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Risk reversal as an investment strategy can be used to simulate being long in a stock, while
reducing the down side risk. Using a risk reversal is a high leverage technique.
From an article at QuantPrinciple: "A risk reversal is a position in which you simulate the
behavior of a long, therefor it is sometimes called a synthetic long. This is an investment strategy
that amounts to both buying and selling out-of-money options simultaneously. In this strategy,
the investor will first make a market hunch, if that hunch is bullish he will want to go long.
However, instead of going long on the stock, he will buy an out of the money call option, and
simultaneously sell an out of the money put option. Presumably he will use the money from the
sale of the put option to purchase the call option. Then as the stock goes up in price, the call
option will be worth more, and the put option will be worth less." [1]
[edit]
Put–call parity
From Wikipedia, the free encyclopedia
In financial mathematics, put-call parity defines a relationship between the price of a call option
and a put option—both with the identical strike price and expiry. To derive the put-call parity
relationship, the assumption is that the options are not exercised before expiration day, which
necessarily applies to European options. Put-call parity can be derived in a manner that is largely
model independent.
[edit]Derivation
An example using stock options follows, though this may be generalised to other options.
Consider a call option and a put option with the same strike K for expiry at the same date T on
some stock, which pays no dividend. Let S denote the (unknown) underlying value at expiration.
First consider a portfolio that consists of one put option and one share. This portfolio at time T
has value:
Now consider a portfolio that consists of one call option and K bonds that each pay 1 (with
certainty) at time T. This portfolio at T has value:
Notice that, whatever the final share price S is at time T, each portfolio is worth the same as the
other. This implies that these two portfolios must have the same value at any time t before T. To
prove this suppose that, at some time t, one portfolio were cheaper than the other. Then one
could purchase (go long) the cheaper portfolio and sell (go short) the more expensive. Our
overall portfolio would, for any value of the share price, have zero value at T. We would be left
with the profit we made at time t. This is known as a risk-less profit and represents an arbitrage
opportunity.
Thus the following relationship exists between the value of the various instruments at a general
time t:
where
C(t) is the value of the call at time t,
Using the above, and given no arbitrage opportunities, for any three prices of the call, put, bond
and stock one can compute the implied price of the fourth.
When valuing European options written on stocks with known dividends that will be paid out
during the life of the option, the formula becomes:
Where D(t) represents the present value of the dividends to be paid out before expiration of the
option.
[edit]History
Nelson, an option arbitrage trader in New York, published a book: "The ABC of Option
Arbitrage" in 1904 that describes the put-call parity in detail. His book was re-discovered by
Espen Gaarder Haug in the early 2000 and many references from Nelson's book are given in
Haug's book "Derivatives Models on Models".
Henry Deutsch describes the put-call parity in 1910 in his book "Arbitrage in Bullion, Coins,
Bills, Stocks, Shares and Options, 2nd Edition". London: Engham Wilson but in less detail than
Nelson (1904).
Mathematics professor Vinzenz Bronzin also derives the put-call parity in 1908 and uses it as
part of his arbitrage argument to develop a series of mathematical option models under a series
of different distributions. The work of professor Bronzin was just recently rediscovered by
professor Wolfgang Hafner and professor Heinz Zimmermann. The original work of Bronzin is a
book written in German and is now translated and published in English in an edited work by
Hafner and Zimmermann (Vinzenz Bronzin's option pricing models, Springer Verlag).
Michael Knoll, in The Ancient Roots of Modern Financial Innovation: The Early History of
Regulatory Arbitrage, describes the important role that put-call parity played in developing the
equity of redemption, the defining characteristic of a modern mortgage, in Medieval England
Russell Sage used put-call parity to create synthetic loans, which had higher interest rates than
the usury laws of the time would have normally allowed.
Its first description in the "modern" literature appears to be Hans Stoll's paper, The Relation
Between Put and Call Prices, from 1969.
[edit]Implications
Put-call parity implies:
• Equivalence of calls and puts: Parity implies that a call and a put can be used
interchangeably in any delta-neutral portfolio. If d is the call's delta, then
buying a call, and selling d shares of stock, is the same as buying a put and
buying 1 − d shares of stock. Equivalence of calls and puts is very
important when trading options.
• Parity of implied volatility: In the absence of dividends or other costs of carry
(such as when a stock is difficult to borrow or sell short), the implied volatility
of calls and puts must be identical.[1]
Volatility swap
In finance, a volatility swap is a forward contract on the future realised volatility of a given
underlying asset. Volatility swaps allow investors to trade the volatility of an asset directly, much
as they would trade a price index.
The underlying is usually a foreign exchange (FX) rate (very liquid market) but could be as well
a single name equity or index. However, the variance swap is preferred in the equity market due
to the fact it can be replicated with a linear combination of options and a dynamic position in
futures.
Unlike a stock option, whose volatility exposure is contaminated by its stock price dependence,
these swaps provide pure exposure to volatility alone. You can use these instruments to speculate
on future volatility levels, to trade the spread between realized and implied volatility, or to hedge
the volatility exposure of other positions or businesses.
Heston model
In finance, the Heston model, named after Steven Heston, is a mathematical model describing
the evolution of the volatility of an underlying asset [1]. It is a stochastic volatility model: such a
model assumes that the volatility of the asset is not constant, nor even deterministic, but follows
a random process.
[edit]Basic Heston model
The basic Heston model assumes that St, the price of the asset, is determined by a stochastic
process:
and are Wiener processes (i.e., random walks) with correlation ρ. In order to retain
model tractability, one may impose parameters to be piecewise-constant.
Another approach is to add a second process of variance, independent of the first one.
A significant extension of Heston model to make both volatility and mean stochastic is given by
Lin Chen (1996). In Chen model the dynamics of the instantaneous interest rate are specified by
[edit]Risk-neutral measure
See Risk-neutral measure for the complete article
Delta neutral
From Wikipedia, the free encyclopedia
(Redirected from Delta hedging)
In finance, delta neutral describes a portfolio of related financial securities in which the
portfolio value remains unchanged due to small changes in the value of the underlying security.
Such a portfolio typically contains options and their corresponding underlying securities such
that positive and negative delta components offset, resulting in the portfolio's value being
relatively insensitive to changes in the value of the underlying security.
A related term, delta hedging is the process of setting or keeping the delta of a portfolio as close
to zero as possible. In practice, maintaining a zero delta is very complex because there are risks
associated with re-hedging on large movements in the underlying stock's price, and research
indicates portfolios tend to have lower cash flows if re-hedged too frequently.[1]
[edit]Nomenclature
δ The sensitivity of an option's value to a change in the underlying stock's price.
V0 The initial value of the option.
V The current Value of the option.
S0 The initial value of the underlying stock.
[edit]Mathematical interpretation
Main article: Greeks (finance)
Delta measures the sensitivity of the value of an option to changes in the price of the underlying
stock assuming all other variables remain unchanged. [2].
Mathematically, delta is represented as partial derivative of the option's fair value with
respect to the price of the underlying security.
Delta is clearly a function of S, however Delta is also a function of Strike Price and time to
expiry. [3]
Therefore, if a position is delta neutral (or, instantaneously delta-hedged) its instantaneous
change in value, for an infinitesimal change in the value of the underlying security, will be zero;
see Hedge (finance). Since delta measures the exposure of a derivative to changes in the value of
the underlying, a portfolio that is delta neutral is effectively hedged. That is, its overall value will
not change for small changes in the price of its underlying instrument.
[edit]Creating the position
Delta hedging - i.e. establishing the required hedge - may be accomplished by buying or selling
an amount of the underlier that corresponds to the delta of the portfolio. By adjusting the amount
bought or sold on new positions, the portfolio delta can be made to sum to zero, and the portfolio
is then delta neutral.
Options market makers, or others, may form a delta neutral portfolio using related options
instead of the underlying. The portfolio's delta (assuming the same underlier) is then the sum of
all the individual options' deltas. This method can also be used when the underlier is difficult to
trade, for instance when an underlying stock is hard to borrow and therefore cannot be sold short.
[edit]Theory
The existence of a delta neutral portfolio was shown as part of the original proof of the Black-
Scholes model, the first comprehensive model to produce correct prices for some classes of
options.
From the Taylor expansion of the value of an option, we get the change in the value of an option,
When the change in the value of the underlier is not small, the second-order term, , cannot be
ignored. In practice, maintaining a delta neutral portfolio requires continual recalculation of the
position's Greeks and rebalancing of the underlier's position. Typically, this rebalancing is
performed daily or weekly.
The Variance risk premium is the phenomenon on the Variance swap market of the variance
swap strike being greater than the realized variance on average. For most trades, the buyer of
variance ends up with a loss on the trade, while the seller profits.[1] The amount that the buyer of
variance typically loses in entering into the variance swap is known as the variance risk
premium. The variance risk premium can be naively justified by taking into account the large
negative convexity of a short variance position; variance during the rare times of crisis can be
50-100 times that of normal market conditions.
Using insurance as an analogy, the variance buyer typically pays a premium to be able to receive
the large positive payoff of a variance swap in times of market turmoil, to "insure" against times
of market turmoil.
The variance risk premium can also be analysed from the perspective of asset allocation. Carr
and Wu (2007) examines whether the excess returns of selling or buying variance swaps can be
explained using common factor models such as the CAPM model and the Fama-French factors,
which include returns of different segments of stocks on the market. Despite the intuitive
connection between stock price volatility and stock price, none of these models are able to
strongly explain the excess returns on variance swaps. This implies that there is another factor
that is unrelated to stock prices that affects how much, on average, one will pay to enter into a
variance swap contract. This suggests that investors are willing to pay extra money to enter into
variance because they dislike variance, not just because it is anti-correlated with stock prices, but
on its own right. This leads to many considering variance as an asset class in and of itself.
In the years before the 2008 Financial Crisis, selling variance on a rolling basis was a popular
trade among hedge funds and other institutional investors.
Efficient-market hypothesis
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In finance, the efficient-market hypothesis (EMH) asserts that financial markets are
"informationally efficient". The weak version of EMH suppose that prices on traded assets
(e.g.,stocks, bonds, or property) already reflect all past publicly available information. The semi-
strong version supposes that prices reflect all publicly available information and instantly change
to reflect new information. The strong version supposes that market reflects even hidden/inside
information. There is some disputed evidence to suggest that the weak and semi-strong versions
are valid while there is powerful evidence against the strong version. Therefore, according to
theory, it is improbable to consistently outperform the market by using any information that the
market already has, except through inside trading. Information or news in the EMH is defined as
anything that may affect prices that is unknowable in the present and thus appears randomly in
the future. The hypothesis has been attacked by critics who blame the belief in rational markets
for much of the financial crisis of 2007–2010,[1][2] with noted financial journalist Roger
Lowenstein declaring "The upside of the current Great Recession is that it could drive a stake
through the heart of the academic nostrum known as the efficient-market hypothesis."[3]
[edit]Historical background
The efficient-market hypothesis was first expressed by Louis Bachelier, a French mathematician,
in his 1900 dissertation, "The Theory of Speculation".[4] His work was largely ignored until the
1950s; however beginning in the 30s scattered, independent work corroborated his thesis. A
small number of studies indicated that US stock prices and related financial series followed a
random walk model.[5] Research by Alfred Cowles in the ’30s and ’40s suggested that
professional investors were in general unable to outperform the market.
The efficient-market hypothesis was developed by Professor Eugene Fama at the University of
Chicago Booth School of Business as an academic concept of study through his published Ph.D.
thesis in the early 1960s at the same school. It was widely accepted up until the 1990s, when
behavioral finance economists, who were a fringe element, became mainstream.[6] Empirical
analyses have consistently found problems with the efficient-market hypothesis, the most
consistent being that stocks with low price to earnings (and similarly, low price to cash-flow or
book value) outperform other stocks.[7][8] Alternative theories have proposed that cognitive biases
cause these inefficiencies, leading investors to purchase overpriced growth stocks rather than
value stocks.[6] Although the efficient-market hypothesis has become controversial because
substantial and lasting inefficiencies are observed, Beechey et al. (2000) consider that it remains
a worthwhile starting point.[9]
The efficient-market hypothesis emerged as a prominent theory in the mid-1960s. Paul
Samuelson had begun to circulate Bachelier's work among economists. In 1964 Bachelier's
dissertation along with the empirical studies mentioned above were published in an anthology
edited by Paul Cootner.[10] In 1965 Eugene Fama published his dissertation arguing for the
random walk hypothesis,[11] and Samuelson published a proof for a version of the efficient-
market hypothesis.[12] In 1970 Fama published a review of both the theory and the evidence for
the hypothesis. The paper extended and refined the theory, included the definitions for three
forms of financial market efficiency: weak, semi-strong and strong (see below).[13]
Further to this evidence that the UK stock market is weak-form efficient, other studies of capital
markets have pointed toward their being semi-strong-form efficient. Studies by Firth (1976,
1979, and 1980) in the United Kingdom have compared the share prices existing after a takeover
announcement with the bid offer. Firth found that the share prices were fully and instantaneously
adjusted to their correct levels, thus concluding that the UK stock market was semi-strong-form
efficient. However, the market's ability to efficiently respond to a short term, widely publicized
event such as a takeover announcement does not necessarily prove market efficiency related to
other more long term, amorphous factors. David Dreman has criticized the evidence provided by
this instant "efficient" response, pointing out that an immediate response is not necessarily
efficient, and that the long-term performance of the stock in response to certain movements are
better indications. A study on stocks response to dividend cuts or increases over three years
found that after an announcement of a dividend cut, stocks underperformed the market by 15.3%
for the three-year period, while stocks outperformed 24.8% for the three years afterward after a
dividend increase announcement.[14]
[edit]Theoretical background
Beyond the normal utility maximizing agents, the efficient-market hypothesis requires that
agents have rational expectations; that on average the population is correct (even if no one
person is) and whenever new relevant information appears, the agents update their expectations
appropriately. Note that it is not required that the agents be rational. EMH allows that when
faced with new information, some investors may overreact and some may underreact. All that is
required by the EMH is that investors' reactions be random and follow a normal distribution
pattern so that the net effect on market prices cannot be reliably exploited to make an abnormal
profit, especially when considering transaction costs (including commissions and spreads). Thus,
any one person can be wrong about the market — indeed, everyone can be — but the market as a
whole is always right. There are three common forms in which the efficient-market hypothesis is
commonly stated — weak-form efficiency, semi-strong-form efficiency and strong-form
efficiency, each of which has different implications for how markets work.
In weak-form efficiency, future prices cannot be predicted by analyzing price from the past.
Excess returns can not be earned in the long run by using investment strategies based on
historical share prices or other historical data. Technical analysis techniques will not be able to
consistently produce excess returns, though some forms of fundamental analysis may still
provide excess returns. Share prices exhibit no serial dependencies, meaning that there are no
"patterns" to asset prices. This implies that future price movements are determined entirely by
information not contained in the price series. Hence, prices must follow a random walk. This
'soft' EMH does not require that prices remain at or near equilibrium, but only that market
participants not be able to systematically profit from market 'inefficiencies'. However, while
EMH predicts that all price movement (in the absence of change in fundamental information) is
random (i.e., non-trending), many studies have shown a marked tendency for the stock markets
to trend over time periods of weeks or longer[15] and that, moreover, there is a positive correlation
between degree of trending and length of time period studied[16] (but note that over long time
periods, the trending is sinusoidal in appearance). Various explanations for such large and
apparently non-random price movements have been promulgated. But the best explanation seems
to be that the distribution of stock market prices is non-Gaussian (in which case EMH, in any of
its current forms, would not be strictly applicable).[17][18]
In semi-strong-form efficiency, it is implied that share prices adjust to publicly available new
information very rapidly and in an unbiased fashion, such that no excess returns can be earned by
trading on that information. Semi-strong-form efficiency implies that neither fundamental
analysis nor technical analysis techniques will be able to reliably produce excess returns. To test
for semi-strong-form efficiency, the adjustments to previously unknown news must be of a
reasonable size and must be instantaneous. To test for this, consistent upward or downward
adjustments after the initial change must be looked for. If there are any such adjustments it
would suggest that investors had interpreted the information in a biased fashion and hence in an
inefficient manner.
In strong-form efficiency, share prices reflect all information, public and private, and no one
can earn excess returns. If there are legal barriers to private information becoming public, as with
insider trading laws, strong-form efficiency is impossible, except in the case where the laws are
universally ignored. To test for strong-form efficiency, a market needs to exist where investors
cannot consistently earn excess returns over a long period of time. Even if some money
managers are consistently observed to beat the market, no refutation even of strong-form
efficiency follows: with hundreds of thousands of fund managers worldwide, even a normal
distribution of returns (as efficiency predicts) should be expected to produce a few dozen "star"
performers.
[edit]Criticism and behavioral finance
Price-Earnings ratios as a predictor of twenty-year returns based upon the plot by
Robert Shiller (Figure 10.1,[19]source). The horizontal axis shows the real price-
earnings ratio of the S&P Composite Stock Price Index as computed in Irrational
Exuberance (inflation adjusted price divided by the prior ten-year mean of inflation-
adjusted earnings). The vertical axis shows the geometric average real annual
return on investing in the S&P Composite Stock Price Index, reinvesting dividends,
and selling twenty years later. Data from different twenty-year periods is color-
coded as shown in the key. See also ten-year returns. Shiller states that this plot
"confirms that long-term investors—investors who commit their money to an
investment for ten full years—did do well when prices were low relative to earnings
at the beginning of the ten years. Long-term investors would be well advised,
individually, to lower their exposure to the stock market when it is high, as it has
been recently, and get into the market when it is low."[19]Burton Malkiel stated that
this correlation may be consistent with an efficient market due to differences in
interest rates.[20]
Investors and researchers have disputed the efficient-market hypothesis both empirically and
theoretically. Behavioral economists attribute the imperfections in financial markets to a
combination of cognitive biases such as overconfidence, overreaction, representative bias,
information bias, and various other predictable human errors in reasoning and information
processing. These have been researched by psychologists such as Daniel Kahneman, Amos
Tversky, Richard Thaler, and Paul Slovic. These errors in reasoning lead most investors to avoid
value stocks and buy growth stocks at expensive prices, which allow those who reason correctly
to profit from bargains in neglected value stocks and the overreacted selling of growth stocks.
Empirical evidence has been mixed, but has generally not supported strong forms of the
efficient-market hypothesis[7][8][21] According to Dreman, in a 1995 paper, low P/E stocks have
greater returns.[22] In an earlier paper he also refuted the assertion by Ray Ball that these higher
returns could be attributed to higher beta,[23] whose research had been accepted by efficient
market theorists as explaining the anomaly[24] in neat accordance with modern portfolio theory.
One can identify "losers" as stocks that have had poor returns over some number of past years.
"Winners" would be those stocks that had high returns over a similar period. The main result of
one such study is that losers have much higher average returns than winners over the following
period of the same number of years.[25] A later study showed that beta (β) cannot account for this
difference in average returns.[26] This tendency of returns to reverse over long horizons (i.e.,
losers become winners) is yet another contradiction of EMH. Losers would have to have much
higher betas than winners in order to justify the return difference. The study showed that the beta
difference required to save the EMH is just not there.
Speculative economic bubbles are an obvious anomaly, in that the market often appears to be
driven by buyers operating on irrational exuberance, who take little notice of underlying value.
These bubbles are typically followed by an overreaction of frantic selling, allowing shrewd
investors to buy stocks at bargain prices. Rational investors have difficulty profiting by shorting
irrational bubbles because, as John Maynard Keynescommented, "Markets can remain irrational
longer than you can remain solvent."[27] Sudden market crashes as happened on Black Monday in
1987 are mysterious from the perspective of efficient markets, but allowed as a rare statistical
event under the Weak-form of EMH.
Burton Malkiel, a well-known proponent of the general validity of EMH, has warned that certain
emerging markets such as China are not empirically efficient; that the Shanghai and Shenzhen
markets, unlike markets in United States, exhibit considerable serial correlation (price trends),
non-random walk, and evidence of manipulation.[28]
Behavioral psychology approaches to stock market trading are among some of the more
promising alternatives to EMH (and some investment strategies seek to exploit exactly such
inefficiencies). But Nobel Laureate co-founder of the programme—Daniel Kahneman—
announced his skepticism of investors beating the market: "They're [investors] just not going to
do it [beat the market]. It's just not going to happen."[29] Indeed defenders of EMH maintain that
Behavioral Finance strengthens the case for EMH in that BF highlights biases in individuals and
committees and not competitive markets. For example, one promiment finding in Behaviorial
Finance is that individuals employ hyperbolic discounting. It is palpably true that bonds,
mortgages, annuities and other similar financial instruments subject to competitive market forces
do not. Any manifestation of hyperbolic discounting in the pricing of these obligations would
invite arbitrage thereby quickly eliminating any vestige of individual biases. Similarly,
diversification, derivative securities and other hedging strategies assuage if not eliminate
potential mispricings from the severe risk-intolerance (loss aversion) of individuals underscored
by behavioral finance. On the other hand, economists, behaviorial psychologists and mutual fund
managers are drawn from the human population and are therefore subject to the biases that
behavioralists showcase. By contrast, the price signals in markets are far less subject to
individual biases highlighted by the Behavioral Finance programme. Richard Thaler has started a
fund based on his research on cognitive biases. In a 2008 report he identified complexity and
herd behavior as central to the global financial crisis of 2008.[30]
Further empirical work has highlighted the impact transaction costs have on the concept of
market efficiency, with much evidence suggesting that any anomalies pertaining to market
inefficiencies are the result of a cost benefit analysis made by those willing to incur the cost of
acquiring the valuable information in order to trade on it. Additionally the concept of liquidity is
a critical component to capturing "inefficiencies" in tests for abnormal returns. Any test of this
proposition faces the joint hypothesis problem, where it is impossible to ever test for market
efficiency, since to do so requires the use of a measuring stick against which abnormal returns
are compared - one cannot know if the market is efficient if one does not know if a model
correctly stipulates the required rate of return. Consequently, a situation arises where either the
asset pricing model is incorrect or the market is inefficient, but one has no way of knowing
which is the case.[citation needed]
A key work on random walk was done in the late 1980s by Profs. Andrew Lo and Craig
MacKinlay; they effectively argue that a random walk does not exist, nor ever has. Their paper
took almost two years to be accepted by academia and in 2001 they published "A Non-random
Walk Down Wall St." which explained the paper in layman's terms.[citation needed]
[edit]Recent financial crisis
The recent global financial crisis has led to renewed scrutiny and criticism of the hypothesis.[31]
Market strategist Jeremy Grantham has stated flatly that EMH is responsible for the current
financial crisis, claiming that belief in the hypothesis caused financial leaders to have a "chronic
underestimation of the dangers of asset bubbles breaking".[2]
At the International Organization of Securities Commissions annual conference, held in June
2009, the hypothesis took center stage. Martin Wolf, the chief economics commentator for the
Financial Times, dismissed the hypothesis as being a useless way to examine how markets
function in reality. Paul McCulley, managing director of PIMCO, was less extreme in his
criticism, saying that the hypothesis had not failed, but was "seriously flawed" in its neglect of
human nature.[32]
The financial crisis has led Richard Posner, a prominent judge, University of Chicago law
professor, and innovator in the field of Law and Economics, to back away from the hypothesis
and express some degree of belief in Keynesian economics. Posner accused some of his 'Chicago
School' colleagues of being "asleep at the switch", saying that "the movement to deregulate the
financial industry went too far by exaggerating the resilience - the self healing powers - of
laissez-faire capitalism."[33] Others, such as Fama himself, said that the theory held up well
during the crisis and that the markets were a casualty of the recession, not the cause of it.
[edit]Popular reception
Despite the best efforts of EMH proponents such as Burton Malkiel, whose book A Random
Walk Down Wall Street achieved best-seller status, the EMH has not caught the public's
imagination. Various forms of stock picking, such as active management, are promoted by
popular CNBC commentator Jim Cramer and former Fidelity Investments fund manager Peter
Lynch, whose books and articles have popularised the notion that investors can "beat the
market".[citation needed]
Many believe that EMH says that a security's price is a correct representation of the value of that
business, as calculated by what the business's future returns will actually be. In other words, they
believe that EMH says a stock's price correctly predicts the underlying company's future results.
Since stock prices clearly do not reflect company future results in many cases, many people
reject EMH as clearly wrong.[citation needed]
Efficient-market hypothesis
In finance, the efficient-market hypothesis (EMH) asserts that financial markets are
"informationally efficient". The weak version of EMH suppose that prices on traded assets
(e.g.,stocks, bonds, or property) already reflect all past publicly available information. The semi-
strong version supposes that prices reflect all publicly available information and instantly change
to reflect new information. The strong version supposes that market reflects even hidden/inside
information. There is some disputed evidence to suggest that the weak and semi-strong versions
are valid while there is powerful evidence against the strong version. Therefore, according to
theory, it is improbable to consistently outperform the market by using any information that the
market already has, except through inside trading. Information or news in the EMH is defined as
anything that may affect prices that is unknowable in the present and thus appears randomly in
the future. The hypothesis has been attacked by critics who blame the belief in rational markets
for much of the financial crisis of 2007–2010,[1][2] with noted financial journalist Roger
Lowenstein declaring "The upside of the current Great Recession is that it could drive a stake
through the heart of the academic nostrum known as the efficient-market hypothesis."[3]
[edit]Historical background
The efficient-market hypothesis was first expressed by Louis Bachelier, a French mathematician,
in his 1900 dissertation, "The Theory of Speculation".[4] His work was largely ignored until the
1950s; however beginning in the 30s scattered, independent work corroborated his thesis. A
small number of studies indicated that US stock prices and related financial series followed a
random walk model.[5] Research by Alfred Cowles in the ’30s and ’40s suggested that
professional investors were in general unable to outperform the market.
The efficient-market hypothesis was developed by Professor Eugene Fama at the University of
Chicago Booth School of Business as an academic concept of study through his published Ph.D.
thesis in the early 1960s at the same school. It was widely accepted up until the 1990s, when
behavioral finance economists, who were a fringe element, became mainstream.[6] Empirical
analyses have consistently found problems with the efficient-market hypothesis, the most
consistent being that stocks with low price to earnings (and similarly, low price to cash-flow or
book value) outperform other stocks.[7][8] Alternative theories have proposed that cognitive biases
cause these inefficiencies, leading investors to purchase overpriced growth stocks rather than
value stocks.[6] Although the efficient-market hypothesis has become controversial because
substantial and lasting inefficiencies are observed, Beechey et al. (2000) consider that it remains
a worthwhile starting point.[9]
The efficient-market hypothesis emerged as a prominent theory in the mid-1960s. Paul
Samuelson had begun to circulate Bachelier's work among economists. In 1964 Bachelier's
dissertation along with the empirical studies mentioned above were published in an anthology
edited by Paul Cootner.[10] In 1965 Eugene Fama published his dissertation arguing for the
random walk hypothesis,[11] and Samuelson published a proof for a version of the efficient-
market hypothesis.[12] In 1970 Fama published a review of both the theory and the evidence for
the hypothesis. The paper extended and refined the theory, included the definitions for three
forms of financial market efficiency: weak, semi-strong and strong (see below).[13]
Further to this evidence that the UK stock market is weak-form efficient, other studies of capital
markets have pointed toward their being semi-strong-form efficient. Studies by Firth (1976,
1979, and 1980) in the United Kingdom have compared the share prices existing after a takeover
announcement with the bid offer. Firth found that the share prices were fully and instantaneously
adjusted to their correct levels, thus concluding that the UK stock market was semi-strong-form
efficient. However, the market's ability to efficiently respond to a short term, widely publicized
event such as a takeover announcement does not necessarily prove market efficiency related to
other more long term, amorphous factors. David Dreman has criticized the evidence provided by
this instant "efficient" response, pointing out that an immediate response is not necessarily
efficient, and that the long-term performance of the stock in response to certain movements are
better indications. A study on stocks response to dividend cuts or increases over three years
found that after an announcement of a dividend cut, stocks underperformed the market by 15.3%
for the three-year period, while stocks outperformed 24.8% for the three years afterward after a
dividend increase announcement.[14]
[edit]Theoretical background
Beyond the normal utility maximizing agents, the efficient-market hypothesis requires that
agents have rational expectations; that on average the population is correct (even if no one
person is) and whenever new relevant information appears, the agents update their expectations
appropriately. Note that it is not required that the agents be rational. EMH allows that when
faced with new information, some investors may overreact and some may underreact. All that is
required by the EMH is that investors' reactions be random and follow a normal distribution
pattern so that the net effect on market prices cannot be reliably exploited to make an abnormal
profit, especially when considering transaction costs (including commissions and spreads). Thus,
any one person can be wrong about the market — indeed, everyone can be — but the market as a
whole is always right. There are three common forms in which the efficient-market hypothesis is
commonly stated — weak-form efficiency, semi-strong-form efficiency and strong-form
efficiency, each of which has different implications for how markets work.
In weak-form efficiency, future prices cannot be predicted by analyzing price from the past.
Excess returns can not be earned in the long run by using investment strategies based on
historical share prices or other historical data. Technical analysis techniques will not be able to
consistently produce excess returns, though some forms of fundamental analysis may still
provide excess returns. Share prices exhibit no serial dependencies, meaning that there are no
"patterns" to asset prices. This implies that future price movements are determined entirely by
information not contained in the price series. Hence, prices must follow a random walk. This
'soft' EMH does not require that prices remain at or near equilibrium, but only that market
participants not be able to systematically profit from market 'inefficiencies'. However, while
EMH predicts that all price movement (in the absence of change in fundamental information) is
random (i.e., non-trending), many studies have shown a marked tendency for the stock markets
to trend over time periods of weeks or longer[15] and that, moreover, there is a positive correlation
between degree of trending and length of time period studied[16] (but note that over long time
periods, the trending is sinusoidal in appearance). Various explanations for such large and
apparently non-random price movements have been promulgated. But the best explanation seems
to be that the distribution of stock market prices is non-Gaussian (in which case EMH, in any of
its current forms, would not be strictly applicable).[17][18]
In semi-strong-form efficiency, it is implied that share prices adjust to publicly available new
information very rapidly and in an unbiased fashion, such that no excess returns can be earned by
trading on that information. Semi-strong-form efficiency implies that neither fundamental
analysis nor technical analysis techniques will be able to reliably produce excess returns. To test
for semi-strong-form efficiency, the adjustments to previously unknown news must be of a
reasonable size and must be instantaneous. To test for this, consistent upward or downward
adjustments after the initial change must be looked for. If there are any such adjustments it
would suggest that investors had interpreted the information in a biased fashion and hence in an
inefficient manner.
In strong-form efficiency, share prices reflect all information, public and private, and no one
can earn excess returns. If there are legal barriers to private information becoming public, as with
insider trading laws, strong-form efficiency is impossible, except in the case where the laws are
universally ignored. To test for strong-form efficiency, a market needs to exist where investors
cannot consistently earn excess returns over a long period of time. Even if some money
managers are consistently observed to beat the market, no refutation even of strong-form
efficiency follows: with hundreds of thousands of fund managers worldwide, even a normal
distribution of returns (as efficiency predicts) should be expected to produce a few dozen "star"
performers.
[edit]Criticism and behavioral finance
Investors and researchers have disputed the efficient-market hypothesis both empirically and
theoretically. Behavioral economists attribute the imperfections in financial markets to a
combination of cognitive biases such as overconfidence, overreaction, representative bias,
information bias, and various other predictable human errors in reasoning and information
processing. These have been researched by psychologists such as Daniel Kahneman, Amos
Tversky, Richard Thaler, and Paul Slovic. These errors in reasoning lead most investors to avoid
value stocks and buy growth stocks at expensive prices, which allow those who reason correctly
to profit from bargains in neglected value stocks and the overreacted selling of growth stocks.
Empirical evidence has been mixed, but has generally not supported strong forms of the
efficient-market hypothesis[7][8][21] According to Dreman, in a 1995 paper, low P/E stocks have
greater returns.[22] In an earlier paper he also refuted the assertion by Ray Ball that these higher
returns could be attributed to higher beta,[23] whose research had been accepted by efficient
market theorists as explaining the anomaly[24] in neat accordance with modern portfolio theory.
One can identify "losers" as stocks that have had poor returns over some number of past years.
"Winners" would be those stocks that had high returns over a similar period. The main result of
one such study is that losers have much higher average returns than winners over the following
period of the same number of years.[25] A later study showed that beta (β) cannot account for this
difference in average returns.[26] This tendency of returns to reverse over long horizons (i.e.,
losers become winners) is yet another contradiction of EMH. Losers would have to have much
higher betas than winners in order to justify the return difference. The study showed that the beta
difference required to save the EMH is just not there.
Speculative economic bubbles are an obvious anomaly, in that the market often appears to be
driven by buyers operating on irrational exuberance, who take little notice of underlying value.
These bubbles are typically followed by an overreaction of frantic selling, allowing shrewd
investors to buy stocks at bargain prices. Rational investors have difficulty profiting by shorting
irrational bubbles because, as John Maynard Keynescommented, "Markets can remain irrational
longer than you can remain solvent."[27] Sudden market crashes as happened on Black Monday in
1987 are mysterious from the perspective of efficient markets, but allowed as a rare statistical
event under the Weak-form of EMH.
Burton Malkiel, a well-known proponent of the general validity of EMH, has warned that certain
emerging markets such as China are not empirically efficient; that the Shanghai and Shenzhen
markets, unlike markets in United States, exhibit considerable serial correlation (price trends),
non-random walk, and evidence of manipulation.[28]
Behavioral psychology approaches to stock market trading are among some of the more
promising alternatives to EMH (and some investment strategies seek to exploit exactly such
inefficiencies). But Nobel Laureate co-founder of the programme—Daniel Kahneman—
announced his skepticism of investors beating the market: "They're [investors] just not going to
do it [beat the market]. It's just not going to happen."[29] Indeed defenders of EMH maintain that
Behavioral Finance strengthens the case for EMH in that BF highlights biases in individuals and
committees and not competitive markets. For example, one promiment finding in Behaviorial
Finance is that individuals employ hyperbolic discounting. It is palpably true that bonds,
mortgages, annuities and other similar financial instruments subject to competitive market forces
do not. Any manifestation of hyperbolic discounting in the pricing of these obligations would
invite arbitrage thereby quickly eliminating any vestige of individual biases. Similarly,
diversification, derivative securities and other hedging strategies assuage if not eliminate
potential mispricings from the severe risk-intolerance (loss aversion) of individuals underscored
by behavioral finance. On the other hand, economists, behaviorial psychologists and mutual fund
managers are drawn from the human population and are therefore subject to the biases that
behavioralists showcase. By contrast, the price signals in markets are far less subject to
individual biases highlighted by the Behavioral Finance programme. Richard Thaler has started a
fund based on his research on cognitive biases. In a 2008 report he identified complexity and
herd behavior as central to the global financial crisis of 2008.[30]
Further empirical work has highlighted the impact transaction costs have on the concept of
market efficiency, with much evidence suggesting that any anomalies pertaining to market
inefficiencies are the result of a cost benefit analysis made by those willing to incur the cost of
acquiring the valuable information in order to trade on it. Additionally the concept of liquidity is
a critical component to capturing "inefficiencies" in tests for abnormal returns. Any test of this
proposition faces the joint hypothesis problem, where it is impossible to ever test for market
efficiency, since to do so requires the use of a measuring stick against which abnormal returns
are compared - one cannot know if the market is efficient if one does not know if a model
correctly stipulates the required rate of return. Consequently, a situation arises where either the
asset pricing model is incorrect or the market is inefficient, but one has no way of knowing
which is the case.[citation needed]
A key work on random walk was done in the late 1980s by Profs. Andrew Lo and Craig
MacKinlay; they effectively argue that a random walk does not exist, nor ever has. Their paper
took almost two years to be accepted by academia and in 2001 they published "A Non-random
Walk Down Wall St." which explained the paper in layman's terms.[citation needed]
[edit]Recent financial crisis
The recent global financial crisis has led to renewed scrutiny and criticism of the hypothesis.[31]
Market strategist Jeremy Grantham has stated flatly that EMH is responsible for the current
financial crisis, claiming that belief in the hypothesis caused financial leaders to have a "chronic
underestimation of the dangers of asset bubbles breaking".[2]
At the International Organization of Securities Commissions annual conference, held in June
2009, the hypothesis took center stage. Martin Wolf, the chief economics commentator for the
Financial Times, dismissed the hypothesis as being a useless way to examine how markets
function in reality. Paul McCulley, managing director of PIMCO, was less extreme in his
criticism, saying that the hypothesis had not failed, but was "seriously flawed" in its neglect of
human nature.[32]
The financial crisis has led Richard Posner, a prominent judge, University of Chicago law
professor, and innovator in the field of Law and Economics, to back away from the hypothesis
and express some degree of belief in Keynesian economics. Posner accused some of his 'Chicago
School' colleagues of being "asleep at the switch", saying that "the movement to deregulate the
financial industry went too far by exaggerating the resilience - the self healing powers - of
laissez-faire capitalism."[33] Others, such as Fama himself, said that the theory held up well
during the crisis and that the markets were a casualty of the recession, not the cause of it.
[edit]Popular reception
Despite the best efforts of EMH proponents such as Burton Malkiel, whose book A Random
Walk Down Wall Street achieved best-seller status, the EMH has not caught the public's
imagination. Various forms of stock picking, such as active management, are promoted by
popular CNBC commentator Jim Cramer and former Fidelity Investments fund manager Peter
Lynch, whose books and articles have popularised the notion that investors can "beat the
market".[citation needed]
Many believe that EMH says that a security's price is a correct representation of the value of that
business, as calculated by what the business's future returns will actually be. In other words, they
believe that EMH says a stock's price correctly predicts the underlying company's future results.
Since stock prices clearly do not reflect company future results in many cases, many people
reject EMH as clearly wrong.[citation needed]
i.e., the conditional expected value of the next observation, given all the past observations, is
equal to the last observation.
Somewhat more generally, a sequence Y1, Y2, Y3 ... is said to be a martingale with respect to
another sequence X1, X2, X3 ... if for all n
This expresses the property that the conditional expectation of an observation at time t, given all
the observations up to time s, is equal to the observation at time s (of course, provided that s ≤ t).
In full generality, a stochastic process Y : T × Ω → S is a martingale with respect to a filtration
Σ∗and probability measure P if
• Σ∗ is a filtration of the underlying probability space (Ω, Σ, P);
• Y is adapted to the filtration Σ∗, i.e., for each t in the index setT, the random
variable Yt is a Σt-measurable function;
• for each t, Yt lies in the Lp spaceL1(Ω, Σt, P; S), i.e.
It is important to note that the property of being a martingale involves both the filtration and the
probability measure (with respect to which the expectations are taken). It is possible that Y could
be a martingale with respect to one measure but not another one; the Girsanov theorem offers a
way to find a measure with respect to which an Itō process is a martingale.
[edit]Examples of martingales
• Suppose Xn is a gambler's fortune after n tosses of a fair coin, where the
gambler wins $1 if the coin comes up heads and loses $1 if the coin comes
up tails. The gambler's conditional expected fortune after the next trial, given
the history, is equal to his present fortune, so this sequence is a martingale.
This is also known as D'Alembert system.
• Let Yn = Xn2 − n where Xn is the gambler's fortune from the preceding
example. Then the sequence { Yn : n = 1, 2, 3, ... } is a martingale. This can
be used to show that the gambler's total gain or loss varies roughly between
plus or minus the square root of the number of steps.
• (de Moivre's martingale) Now suppose an "unfair" or "biased" coin, with
probability p of "heads" and probability q = 1 − p of "tails". Let
• Suppose each amoeba either splits into two amoebas, with probability p, or
eventually dies, with probability 1 − p. Let Xn be the number of amoebas
surviving in the nth generation (in particular Xn = 0 if the population has
become extinct by that time). Let r be the probability of eventual extinction.
(Finding r as function of p is an instructive exercise. Hint: The probability that
the descendants of an amoeba eventually die out is equal to the probability
that either of its immediate offspring dies out, given that the original amoeba
has split.) Then
The more general definitions of both discrete-time and continuous-time martingales given earlier
can be converted into the corresponding definitions of sub/supermartingales in the same way by
replacing the equality for the conditional expectation by an inequality.
Here is a mnemonic for remembering which is which: "Life is a supermartingale; as time
advances, expectation decreases."
[edit]Examples of submartingales and supermartingales
• Every martingale is also a submartingale and a supermartingale. Conversely,
any stochastic process that is both a submartingale and a supermartingale is
a martingale.
• Consider again the gambler who wins $1 when a coin comes up heads and
loses $1 when the coin comes up tails. Suppose now that the coin may be
biased, so that it comes up heads with probability p.
○ If p is equal to 1/2, the gambler on average neither wins nor loses
money, and the gambler's fortune over time is a martingale.
○ If p is less than 1/2, the gambler loses money on average, and the
gambler's fortune over time is a supermartingale.
○ If p is greater than 1/2, the gambler wins money on average, and the
gambler's fortune over time is a submartingale.
• A convex function of a martingale is a submartingale, by Jensen's inequality.
For example, the square of the gambler's fortune in the fair coin game is a
submartingale (which also follows from the fact that Xn2 − n is a martingale).
Similarly, a concave function of a martingale is a supermartingale.
A stopping time with respect to a sequence of random variables X1, X2, X3, ... is a random
variable τ with the property that for each t, the occurrence or non-occurrence of the event τ = t
depends only on the values of X1, X2, X3, ..., Xt. The intuition behind the definition is that at any
particular time t, you can look at the sequence so far and tell if it is time to stop. An example in
real life might be the time at which a gambler leaves the gambling table, which might be a
function of his previous winnings (for example, he might leave only when he goes broke), but he
can't choose to go or stay based on the outcome of games that haven't been played yet.
Some mathematicians defined the concept of stopping time by requiring only that the occurrence
or non-occurrence of the event τ = t be probabilistically independent of Xt + 1, Xt + 2, ... but not that
it be completely determined by the history of the process up to time t. That is a weaker condition
than the one appearing in the paragraph above, but is strong enough to serve in some of the
proofs in which stopping times are used.
One of the basic property of martingales, that if (Xt)t> 0 is (sub-/super-) martingale and τ is a
Measure (mathematics)
Informally, a measure has the property of being monotone in the sense that if A is a
subset of B, the measure of A is less than or equal to the measure of B.
Furthermore, the measure of the empty set is required to be 0.
for all
• Null empty set:
The second condition may be treated as a special case of countable additivity, if the empty
collection is allowed as a countable collection (and the empty sum is interpreted as 0).
Otherwise, if the empty collection is disallowed (but finite collections are allowed), the second
condition still follows from countable additivity provided, however, that there is at least one set
having finite measure.
The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets,
and the triple(X, Σ, μ) is called a measure space.
If only the second and third conditions are met, and μ takes on at most one of the values ±∞,
then μ is called a signed measure.
A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability
space is a measure space with a probability measure.
For measure spaces that are also topological spaces various compatibility conditions can be
placed for the measure and the topology. Most measures met in practice in analysis (and in many
cases also in probability theory) are Radon measures. Radon measures have an alternative
definition in terms of linear functionals on the locally convex space of continuous functions with
compact support. This approach is taken by Bourbaki (2004) and a number of other authors. For
more details see Radon measure.
[edit]Properties
Several further properties can be derived from the definition of a countably additive measure.
[edit]Monotonicity
A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆E2 then
A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of
En + 1 for all n, then the union of the sets En is measurable, and
This property is false without the assumption that at least one of the En has finite measure. For
instance, for each n∈N, let
which all have infinite Lebesgue measure, but the intersection is empty.
[edit]Sigma-finite measures
Main article: Sigma-finite measure
A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is
called σ-finite if X can be decomposed into a countable union of measurable sets of finite
measure. A set in a measure space has σ-finite measure if it is a countable union of sets with
finite measure.
For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite.
Consider the closed intervals [k,k+1] for all integersk; there are countably many such intervals,
each has measure 1, and their union is the entire real line. Alternatively, consider the real
numbers with the counting measure, which assigns to each finite set of reals the number of points
in the set. This measure space is not σ-finite, because every set with finite measure contains only
finitely many points, and it would take uncountably many such sets to cover the entire real line.
The σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared
in this respect to the Lindelöf property of topological spaces. They can be also thought of as a
vague generalization of the idea that a measure space may have 'uncountable measure'.
[edit]Completeness
A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set.
A negligible set need not be measurable, but every measurable negligible set is automatically a
null set. A measure is called complete if every negligible set is measurable.
A measure can be extended to a complete one by considering the σ-algebra of subsets Y which
differ by a negligible set from a measurable set X, that is, such that the symmetric difference of X
and Y is contained in a null set. One defines μ(Y) to equal μ(X).
[edit]Examples
Some important measures are listed here.
• The counting measure is defined by μ(S) = number of elements in S.
• The Lebesgue measure on R is a complete translation-invariant measure on a
σ-algebra containing the intervals in R such that μ([0,1]) = 1; and every other
measure with these properties extends Lebesgue measure.
• Circular angle measure is invariant under rotation.
• The Haar measure for a locally compacttopological group is a generalization
of the Lebesgue measure (and also of counting measure and circular angle
measure) and has similar uniqueness properties.
• The Hausdorff measure which is a refinement of the Lebesgue measure to
some fractal sets.
• Every probability space gives rise to a measure which takes the value 1 on
the whole space (and therefore takes all its values in the unit interval [0,1]).
Such a measure is called a probability measure. See probability axioms.
• The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a),
where χS is the characteristic function of S. The measure of a set is 1 if it
contains the point a and 0 otherwise.
Other 'named' measures used in various theories include: Borel measure, Jordan measure,
ergodic measure, Euler measure, Gaussian measure, Baire measure, Radon measure.
[edit]Non-measurable sets
Main article: Non-measurable set
If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue
measurable; examples of such sets include the Vitali set, and the non-measurable sets postulated
by the Hausdorff paradox and the Banach–Tarski paradox.
[edit]Generalizations
For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-
negative reals or infinity. For instance, a countably additive set function with values in the
(signed) real numbers is called a signed measure, while such a function with values in the
complex numbers is called a complex measure. Measures that take values in Banach spaces have
been studied extensively. A measure that takes values in the set of self-adjoint projections on a
Hilbert space is called a projection-valued measure; these are used mainly in functional analysis
for the spectral theorem. When it is necessary to distinguish the usual measures which take non-
negative values from generalizations, the term positive measure is used. Positive measures are
closed under conical combination but not general linear combination, while signed measures are
the linear closure of positive measures.
Another generalization is the finitely additive measure, which are sometimes called contents.
This is the same as a measure except that instead of requiring countable additivity we require
only finite additivity. Historically, this definition was used first, but proved to be not so useful. It
turns out that in general, finitely additive measures are connected with notions such as Banach
limits, the dual of L∞ and the Stone–Čech compactification. All these are linked in one way or
another to the axiom of choice.
A charge is a generalization in both directions: it is a finitely additive, signed measure.
The remarkable result in integral geometry known as Hadwiger's theorem states that the space of
translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite
unions of compact convex sets in Rn consists (up to scalar multiples) of one "measure" that is
"homogeneous of degree k" for each k = 0, 1, 2, ..., n, and linear combinations of those
"measures". "Homogeneous of degree k" means that rescaling any set by any factor c> 0
multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary n-
dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The
one that is homogeneous of degree 1 is a mysterious function called the "mean width", a
misnomer. The one that is homogeneous of degree 0 is the Euler characteristic.
Measure (mathematics)
Informally, a measure has the property of being monotone in the sense that if A is a
subset of B, the measure of A is less than or equal to the measure of B.
Furthermore, the measure of the empty set is required to be 0.
for all
• Null empty set:
The second condition may be treated as a special case of countable additivity, if the empty
collection is allowed as a countable collection (and the empty sum is interpreted as 0).
Otherwise, if the empty collection is disallowed (but finite collections are allowed), the second
condition still follows from countable additivity provided, however, that there is at least one set
having finite measure.
The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets,
and the triple(X, Σ, μ) is called a measure space.
If only the second and third conditions are met, and μ takes on at most one of the values ±∞,
then μ is called a signed measure.
A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability
space is a measure space with a probability measure.
For measure spaces that are also topological spaces various compatibility conditions can be
placed for the measure and the topology. Most measures met in practice in analysis (and in many
cases also in probability theory) are Radon measures. Radon measures have an alternative
definition in terms of linear functionals on the locally convex space of continuous functions with
compact support. This approach is taken by Bourbaki (2004) and a number of other authors. For
more details see Radon measure.
[edit]Properties
Several further properties can be derived from the definition of a countably additive measure.
[edit]Monotonicity
A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆E2 then
A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of
En + 1 for all n, then the union of the sets En is measurable, and
This property is false without the assumption that at least one of the En has finite measure. For
instance, for each n∈N, let
which all have infinite Lebesgue measure, but the intersection is empty.
[edit]Sigma-finite measures
Main article: Sigma-finite measure
A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is
called σ-finite if X can be decomposed into a countable union of measurable sets of finite
measure. A set in a measure space has σ-finite measure if it is a countable union of sets with
finite measure.
For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite.
Consider the closed intervals [k,k+1] for all integersk; there are countably many such intervals,
each has measure 1, and their union is the entire real line. Alternatively, consider the real
numbers with the counting measure, which assigns to each finite set of reals the number of points
in the set. This measure space is not σ-finite, because every set with finite measure contains only
finitely many points, and it would take uncountably many such sets to cover the entire real line.
The σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared
in this respect to the Lindelöf property of topological spaces. They can be also thought of as a
vague generalization of the idea that a measure space may have 'uncountable measure'.
[edit]Completeness
A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set.
A negligible set need not be measurable, but every measurable negligible set is automatically a
null set. A measure is called complete if every negligible set is measurable.
A measure can be extended to a complete one by considering the σ-algebra of subsets Y which
differ by a negligible set from a measurable set X, that is, such that the symmetric difference of X
and Y is contained in a null set. One defines μ(Y) to equal μ(X).
[edit]Examples
Some important measures are listed here.
• The counting measure is defined by μ(S) = number of elements in S.
• The Lebesgue measure on R is a complete translation-invariant measure on a
σ-algebra containing the intervals in R such that μ([0,1]) = 1; and every other
measure with these properties extends Lebesgue measure.
• Circular angle measure is invariant under rotation.
• The Haar measure for a locally compacttopological group is a generalization
of the Lebesgue measure (and also of counting measure and circular angle
measure) and has similar uniqueness properties.
• The Hausdorff measure which is a refinement of the Lebesgue measure to
some fractal sets.
• Every probability space gives rise to a measure which takes the value 1 on
the whole space (and therefore takes all its values in the unit interval [0,1]).
Such a measure is called a probability measure. See probability axioms.
• The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a),
where χS is the characteristic function of S. The measure of a set is 1 if it
contains the point a and 0 otherwise.
Other 'named' measures used in various theories include: Borel measure, Jordan measure,
ergodic measure, Euler measure, Gaussian measure, Baire measure, Radon measure.
[edit]Non-measurable sets
Main article: Non-measurable set
If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue
measurable; examples of such sets include the Vitali set, and the non-measurable sets postulated
by the Hausdorff paradox and the Banach–Tarski paradox.
[edit]Generalizations
For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-
negative reals or infinity. For instance, a countably additive set function with values in the
(signed) real numbers is called a signed measure, while such a function with values in the
complex numbers is called a complex measure. Measures that take values in Banach spaces have
been studied extensively. A measure that takes values in the set of self-adjoint projections on a
Hilbert space is called a projection-valued measure; these are used mainly in functional analysis
for the spectral theorem. When it is necessary to distinguish the usual measures which take non-
negative values from generalizations, the term positive measure is used. Positive measures are
closed under conical combination but not general linear combination, while signed measures are
the linear closure of positive measures.
Another generalization is the finitely additive measure, which are sometimes called contents.
This is the same as a measure except that instead of requiring countable additivity we require
only finite additivity. Historically, this definition was used first, but proved to be not so useful. It
turns out that in general, finitely additive measures are connected with notions such as Banach
limits, the dual of L∞ and the Stone–Čech compactification. All these are linked in one way or
another to the axiom of choice.
A charge is a generalization in both directions: it is a finitely additive, signed measure.
The remarkable result in integral geometry known as Hadwiger's theorem states that the space of
translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite
unions of compact convex sets in Rn consists (up to scalar multiples) of one "measure" that is
"homogeneous of degree k" for each k = 0, 1, 2, ..., n, and linear combinations of those
"measures". "Homogeneous of degree k" means that rescaling any set by any factor c> 0
multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary n-
dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The
one that is homogeneous of degree 1 is a mysterious function called the "mean width", a
misnomer. The one that is homogeneous of degree 0 is the Euler characteristic.
Tail risk
Tail risk is the risk of an asset or portfolio of assets moving more than 3 standard deviation from
its current price in a probability density function.[1]. This is often under estimated using normal
statistical methods for calculating the probability of changes in the price of financial assets.
The normal distribution which can be used for calculating the probability of sudden asset price
changes is particularly prone to this type of error. However, many if not most types of analysis
are prone to this error to a lesser scale.[2]
Liquidity risk
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In finance, liquidity risk is the risk that a given security or asset cannot be traded quickly
enough in the market to prevent a loss (or make the required profit).
[edit]Types of Liquidity Risk
Asset liquidity - An asset cannot be sold due to lack of liquidity in the market - essentially a
sub-set of market risk. This can be accounted for by:
• Widening bid/offer spread
• Making explicit liquidity reserves
• Lengthening holding period for VaR calculations
Funding liquidity - Risk that liabilities:
• Cannot be met when they fall due
• Can only be met at an uneconomic price
• Can be name-specific or systemic
[edit]Case Studies
[edit]Amaranth Advisors LLC - 2006
Amaranth Advisors lost roughly $6bn in the natural gas futures market back in September 2006.
Amaranth had a concentrated, undiversified position in its natural gas strategy. The trader had
used leverage to build a very large position. Amaranth’s positions were staggeringly large,
representing around 10% of the global market in natural gas futures.[6] Chincarini notes that firms
need to manage liquidity risk explicitly. The inability to sell a futures contract at or near the
latest quoted price is related to one’s concentration in the security. In Amaranth’s case, the
concentration was far too high and there were no natural counterparties when they needed to
unwind the positions.[7] Chincarini (2006) argues that part of the loss Amaranth incurred was due
to asset illiquidity. Regression analysis on the 3 week return on natural gas future contracts from
August 31, 2006 to September 21, 2006 against the excess open interest suggested that contracts
whose open interest was much higher on August 31, 2006 than the historical normalized value,
experienced larger negative returns.[8]
[edit]Northern Rock - 2007
Main article: Nationalisation of Northern Rock
Northern Rock suffered from funding liquidity risk back in September 2007 following the
subprime crisis. The firm suffered from liquidity issues despite being solvent at the time, because
maturing loans and deposits could not be renewed in the short-term money markets [9]. In
response, the FSA now places greater supervisory focus on liquidity risk especially with regard
to "high-impact retail firms".[10]
[edit]LTCM - 1998
Long-Term Capital Management (LTCM) was bailed out by a consortium of 14 banks in 1998
after being caught in a cash-flow crisis when economic shocks resulted in excessive mark-to-
market losses and margin calls. The fund suffered from a combination of funding and asset
liquidity. Asset liquidity arose from LTCM failure to account for liquidity becoming more
valuable (as it did following the crisis) . Since much of its balance sheet was exposed to liquidity
risk premium its short positions would increase in price relative to its long positions. This was
essentially a massive, unhedged exposure to a single risk factor.[11] LTCM had been aware of
funding liquidity risk. Indeed, they estimated that in times of severe stress, haircuts on AAA-
rated commercial mortgages would increase from 2% to 10%, and similarly for other securities.
In response to this, LTCM had negotiated long-term financing with margins fixed for several
weeks on many of their collateralized loans. Due to an escalating liquidity spiral, LTCM could
ultimately not fund its positions in spite of its numerous measures to control funding risk.[12]
Pull to par
Pull to Par is the effect in which the price of a bond converges to par value as time passes. At
maturity the price of a debt instrument in good standing should equal its par (or face value).
Another name for this effect is reduction of maturity.
It results from the difference between market interest rate and the nominal yield on the bond.
The Pull to Par effect is one of two factors that influence the market value of the bond and its
volatility (the second one is the level of market interest rates).
Yield curve
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The US dollar yield curve as of 9 February 2005. The curve has a typical upward
sloping shape.
This article is about yield curves as used in finance. For the term's use in physics,
see Yield curve (physics).
In finance, the yield curve is the relation between the interest rate (or cost of borrowing) and the
time to maturity of the debt for a given borrower in a given currency. For example, the U.S.
dollar interest rates paid on U.S. Treasury securities for various maturities are closely watched by
many traders, and are commonly plotted on a graph such as the one on the right which is
informally called "the yield curve." More formal mathematical descriptions of this relation are
often called the term structure of interest rates.
The yield of a debt instrument is the overall rate of return available on the investment. For
instance, a bank account that pays an interest rate of 4% per year has a 4% yield. In general the
percentage per year that can be earned is dependent on the length of time that the money is
invested. For example, a bank may offer a "savings rate" higher than the normal checking
account rate if the customer is prepared to leave money untouched for five years. Investing for a
period of time t gives a yield Y(t).
This function Y is called the yield curve, and it is often, but not always, an increasing function of
t. Yield curves are used by fixed income analysts, who analyze bonds and related securities, to
understand conditions in financial markets and to seek trading opportunities. Economists use the
curves to understand economic conditions.
The yield curve function Y is actually only known with certainty for a few specific maturity
dates, while the other maturities are calculated by interpolation (see Construction of the full yield
curve from market data below).
[edit]The typical shape of the yield curve
The British pound yield curve as of 9 February 2005. This curve is unusual in that
long-term rates are lower than short-term ones.
Yield curves are usually upward sloping asymptotically: the longer the maturity, the higher the
yield, with diminishing marginal increases (that is, as one moves to the right, the curve flattens
out). There are two common explanations for upward sloping yield curves. First, it may be that
the market is anticipating a rise in the risk-free rate. If investors hold off investing now, they may
receive a better rate in the future. Therefore, under the arbitrage pricing theory, investors who are
willing to lock their money in now need to be compensated for the anticipated rise in rates—thus
the higher interest rate on long-term investments.
However, interest rates can fall just as they can rise. Another explanation is that longer maturities
entail greater risks for the investor (i.e. the lender). A risk premium is needed by the market,
since at longer durations there is more uncertainty and a greater chance of catastrophic events
that impact the investment. This explanation depends on the notion that the economy faces more
uncertainties in the distant future than in the near term. This effect is referred to as the liquidity
spread. If the market expects more volatility in the future, even if interest rates are anticipated to
decline, the increase in the risk premium can influence the spread and cause an increasing yield.
The opposite position (short-term interest rates higher than long-term) can also occur. For
instance, in November 2004, the yield curve for UK Government bonds was partially inverted.
The yield for the 10 year bond stood at 4.68%, but was only 4.45% for the 30 year bond. The
market's anticipation of falling interest rates causes such incidents. Negative liquidity premiums
can exist if long-term investors dominate the market, but the prevailing view is that a positive
liquidity premium dominates, so only the anticipation of falling interest rates will cause an
inverted yield curve. Strongly inverted yield curves have historically preceded economic
depressions.
The shape of the yield curve is influenced by supply and demand: for instance if there is a large
demand for long bonds, for instance from pension funds to match their fixed liabilities to
pensioners, and not enough bonds in existence to meet this demand, then the yields on long
bonds can be expected to be low, irrespective of market participants' views about future events.
The yield curve may also be flat or hump-shaped, due to anticipated interest rates being steady,
or short-term volatility outweighing long-term volatility.
Yield curves continually move all the time that the markets are open, reflecting the market's
reaction to news. A further "stylized fact" is that yield curves tend to move in parallel (i.e., the
yield curve shifts up and down as interest rate levels rise and fall).
[edit]Types of yield curve
There is no single yield curve describing the cost of money for everybody. The most important
factor in determining a yield curve is the currency in which the securities are denominated. The
economic position of the countries and companies using each currency is a primary factor in
determining the yield curve. Different institutions borrow money at different rates, depending on
their creditworthiness. The yield curves corresponding to the bonds issued by governments in
their own currency are called the government bond yield curve (government curve). Banks with
high credit ratings (Aa/AA or above) borrow money from each other at the LIBOR rates. These
yield curves are typically a little higher than government curves. They are the most important
and widely used in the financial markets, and are known variously as the LIBOR curve or the
swap curve. The construction of the swap curve is described below.
Besides the government curve and the LIBOR curve, there are corporate (company) curves.
These are constructed from the yields of bonds issued by corporations. Since corporations have
less creditworthiness than most governments and most large banks, these yields are typically
higher. Corporate yield curves are often quoted in terms of a "credit spread" over the relevant
swap curve. For instance the five-year yield curve point for Vodafone might be quoted as LIBOR
+0.25%, where 0.25% (often written as 25 basis points or 25bps) is the credit spread.
[edit]Normal yield curve
From the post-Great Depression era to the present, the yield curve has usually been "normal"
meaning that yields rise as maturity lengthens (i.e., the slope of the yield curve is positive). This
positive slope reflects investor expectations for the economy to grow in the future and,
importantly, for this growth to be associated with a greater expectation that inflation will rise in
the future rather than fall. This expectation of higher inflation leads to expectations that the
central bank will tighten monetary policy by raising short term interest rates in the future to slow
economic growth and dampen inflationary pressure. It also creates a need for a risk premium
associated with the uncertainty about the future rate of inflation and the risk this poses to the
future value of cash flows. Investors price these risks into the yield curve by demanding higher
yields for maturities further into the future.
However, a positively sloped yield curve has not always been the norm. Through much of the
19th century and early 20th century the US economy experienced trend growth with persistent
deflation, not inflation. During this period the yield curve was typically inverted, reflecting the
fact that deflation made current cash flows less valuable than future cash flows. During this
period of persistent deflation, a 'normal' yield curve was negatively sloped.
[edit]Steep yield curve
Historically, the 20-year Treasury bond yield has averaged approximately two percentage points
above that of three-month Treasury bills. In situations when this gap increases (e.g. 20-year
Treasury yield rises higher than the three-month Treasury yield), the economy is expected to
improve quickly in the future. This type of curve can be seen at the beginning of an economic
expansion (or after the end of a recession). Here, economic stagnation will have depressed short-
term interest rates; however, rates begin to rise once the demand for capital is re-established by
growing economic activity.
In January 2010, the gap between yields on two-year Treasury notes and 10-year notes widened
to 2.90 percentage points, its highest ever.
[edit]Flat or humped yield curve
A flat yield curve is observed when all maturities have similar yields, whereas a humped curve
results when short-term and long-term yields are equal and medium-term yields are higher than
those of the short-term and long-term. A flat curve sends signals of uncertainty in the economy.
This mixed signal can revert to a normal curve or could later result into an inverted curve. It
cannot be explained by the Segmented Market theory discussed below.
[edit]Inverted yield curve
An inverted yield curve occurs when long-term yields fall below short-term yields. Under
unusual circumstances, long-term investors will settle for lower yields now if they think the
economy will slow or even decline in the future. An inverted curve has indicated a worsening
economic situation in the future 6 out of 7 times since 1970.[citation needed] The New York Federal
Reserve regards it as a valuable forecasting tool in predicting recessions two to six quarters
ahead. In addition to potentially signaling an economic decline, inverted yield curves also imply
that the market believes inflation will remain low. This is because, even if there is a recession, a
low bond yield will still be offset by low inflation. However, technical factors, such as a flight to
quality or global economic or currency situations, may cause an increase in demand for bonds on
the long end of the yield curve, causing long-term rates to fall. This was seen in 1998 during the
Long Term Capital Management failure when there was a slight inversion on part of the curve.
[edit]Theory
There are four main economic theories attempting to explain how yields vary with maturity. Two
of the theories are extreme positions, while the third attempts to find a middle ground between
the former two.
[edit]Market expectations (pure expectations) hypothesis
This hypothesis assumes that the various maturities are perfect substitutes and suggests that the
shape of the yield curve depends on market participants' expectations of future interest rates.
These expected rates, along with an assumption that arbitrage opportunities will be minimal, is
enough information to construct a complete yield curve. For example, if investors have an
expectation of what 1-year interest rates will be next year, the 2-year interest rate can be
calculated as the compounding of this year's interest rate by next year's interest rate. More
generally, rates on a long-term instrument are equal to the geometric mean of the yield on a
series of short-term instruments. This theory perfectly explains the observation that yields
usually move together. However, it fails to explain the persistence in the shape of the yield
curve.
Shortcomings of expectations theory: Neglects the risks inherent in investing in bonds (because
forward rates are not perfect predictors of future rates). 1) Interest rate risk 2) Reinvestment rate
risk
[edit]Liquidity preference theory
The Liquidity Preference Theory, also known as the Liquidity Premium Theory, is an offshoot of
the Pure Expectations Theory. The Liquidity Preference Theory asserts that long-term interest
rates not only reflect investors’ assumptions about future interest rates but also include a
premium for holding long-term bonds (investors prefer short term bonds to long term bonds),
called the term premium or the liquidity premium. This premium compensates investors for the
added risk of having their money tied up for a longer period, including the greater price
uncertainty. Because of the term premium, long-term bond yields tend to be higher than short-
term yields, and the yield curve slopes upward. Long term yields are also higher not just because
of the liquidity premium, but also because of the risk premium added by the risk of default from
holding a security over the long term. The market expectations hypothesis is combined with the
liquidity preference theory:
Risk-neutral measure
In mathematical finance, a risk-neutral measure, equivalent martingale measure, or Q-
measure is a probability measure that results when one assumes that the current value of all
financial assets is equal to the expected value of the future payoff of the asset discounted at the
risk-free rate. The concept is used in the pricing of derivatives.
[edit]Idea
In an actual economy, prices of assets depend crucially on their risk. Investors typically demand
payment for bearing uncertainty. Therefore, today's price of a claim on a risky amount realised
tomorrow will generally differ from its expected value. Most commonly,[1] investors are risk-
averse and today's price is below the expectation, remunerating those who bear the risk.
To price assets, consequently, the calculated expected values need to be adjusted for the risk
involved (see also Sharpe ratio).
It turns out, under certain weak conditions (absence of arbitrage) there is an alternative way to do
this calculation: Instead of first taking the expectation and then adjusting for risk, one can first
adjust the probabilities of future outcomes such that they incorporate the effects of risk, and then
take the expectation under those different probabilities. Those adjusted, 'virtual' probabilities are
called risk-neutral probabilities, they constitute the risk-neutral measure.
It is important to note that clearly the probabilities over asset outcomes in the real world cannot
be impacted; the constructed probabilities are counterfactual. They are only computed because
the second way of pricing, called risk-neutral pricing, is often much simpler to calculate than the
first.
The main benefit stems from the fact that once the risk-neutral probabilities are found, every
asset can be priced by simply taking its expected payoff (i.e. calculating as if investors were risk
neutral). If we used the real-world, physical probabilities, every security would require a
different adjustment (as they differ in riskiness).
Note that under the risk-neutral measure all assets have the same expected rate of return, the risk-
free rate (or short rate). This does not imply the assumption that investors were risk neutral. On
the contrary, the point is to price given exactly the risk aversion we observe in the physical
world. Towards that aim, we hypothesize about parallel universes where everybody is risk
neutral. The risk-neutral measure is the probability measure of that parallel universe where all
claims have exactly the prices they have in our real world.
Mathematically, adjusting the probabilities is a measure transformation to an equivalent
martingale measure; it is possible if there are no arbitrage opportunities. If the markets are
complete, the risk-neutral measure is unique.
Often, the physical measure is called P, and the risk-neutral one Q. The term physical measure
is often abused to denote the Lebesgue measure, occasionally, the measure induced by the
corresponding normal density with respect to the Lebesgue measure.
[edit]Usage
Risk-neutral measures make it easy to express the value of a derivative in a formula. Suppose at
a future time T a derivative (e.g., a call option on a stock) pays HT units, where HT is a random
variable on the probability space describing the market. Further suppose that the discount factor
from now (time zero) until time T is P(0,T). Then today's fair value of the derivative is
where the risk-neutral measure is denoted by Q. This can be re-stated in terms of the physical
measure P as
where is the Radon–Nikodym derivative of Q with respect to P.
Another name for the risk-neutral measure is the equivalent martingale measure. If in a financial
market there is just one risk-neutral measure, then there is a unique arbitrage-free price for each
asset in the market. This is the fundamental theorem of arbitrage-free pricing. If there are
more such measures, then in an interval of prices no arbitrage is possible. If no equivalent
martingale measure exists, arbitrage opportunities do.
[edit]Example 1 — Binomial model of stock prices
Given a probability space , consider a single-period binomial model. A probability
Given a derivative with payoff Xu when the stock price moves up and Xd when it goes down, we
can price the derivative via
where Wt is a standard Brownian motion with respect to the physical measure. If we define
Girsanov's theorem states that there exists a measure Q under which is a Brownian motion.
Q is the unique risk-neutral measure for the model. The (discounted) payoff process of a
[edit]Notes
1. ^ At least in large financial markets. Example of risk-seeking markets are
casinos and lotteries.
Growth investing
Growth investing is a style of investment strategy. Those who follow this style, known as
growth investors, invest in companies that exhibit signs of above-average growth, even if the
share price appears expensive in terms of metrics such as price-to-earnings or price-to-book
ratios. In typical usage, the term "growth investing" contrasts with the strategy known as value
investing.
However, some notable investors such as Warren Buffett have stated that there is no theoretical
difference between the concepts of value and growth ("Growth and Value Investing are joined at
the hip"), in consideration of the concept of an asset's intrinsic value. In addition, when just
investing in one style of stocks, diversification could be negatively impacted.
Thomas Rowe Price, Jr. has been called "the father of growth investing".[1]
[edit]Growth at reasonable price
After the busting of the dotcom bubble, "growth at any price" has fallen from favour. Attaching a
high price to a security in the hope of high growth may be risky, since if the growth rate fails to
live up to expectations, the price of the security can plummet. It is often more fashionable now to
seek out stocks with high growth rates that are trading at reasonable valuations.
[edit]Growth investment vehicles
There are many ways to execute a growth investment strategy. Some of these include:
• Emerging markets
• Recovery shares
• Blue chips
• Internet and technology stock
• Smaller companies
• Special situations
• Second-hand life policies
Value investing
Value investing is an investmentparadigm that derives from the ideas on investment and
speculation that Ben Graham&David Dodd began teaching at Columbia Business School in 1928
and subsequently developed in their 1934 text Security Analysis. Although value investing has
taken many forms since its inception, it generally involves buying securities whose shares appear
underpriced by some form(s) of fundamental analysis.[1] As examples, such securities may be
stock in public companies that trade at discounts to book value or tangible book value, have high
dividend yields, have low price-to-earning multiples or have low price-to-book ratios.
High-profile proponents of value investing, including Berkshire Hathaway chairman Warren
Buffett, have argued that the essence of value investing is buying stocks at less than their
intrinsic value.[2] The discount of the market price to the intrinsic value is what Benjamin
Graham called the "margin of safety". The intrinsic value is the discounted value of all future
distributions.
However, the future distributions and the appropriate discount rate can only be assumptions.
Warren Buffett has taken the value investing concept even further as his thinking has evolved to
where for the last 25 years or so his focus has been on "finding an outstanding company at a
sensible price" rather than generic companies at a bargain price.
[edit]History
[edit]Benjamin Graham
Value investing was established by Benjamin Graham and David Dodd, both professors at
Columbia Business School and teachers of many famous investors. In Graham's book The
Intelligent Investor, he advocated the important concept of margin of safety — first introduced in
Security Analysis, a 1934 book he coauthored with David Dodd — which calls for a cautious
approach to investing. In terms of picking stocks, he recommended defensive investment in
stocks trading below their tangible book value as a safeguard to adverse future developments
often encountered in the stock market.
[edit]Further evolution
However, the concept of value (as well as "book value") has evolved significantly since the
1970s. Book value is most useful in industries where most assets are tangible. Intangible assets
such as patents, software, brands, or goodwill are difficult to quantify, and may not survive the
break-up of a company. When an industry is going through fast technological advancements, the
value of its assets is not easily estimated. Sometimes, the production power of an asset can be
significantly reduced due to competitive disruptive innovation and therefore its value can suffer
permanent impairment. One good example of decreasing asset value is a personal computer. An
example of where book value does not mean much is the service and retail sectors. One modern
model of calculating value is the discounted cash flow model (DCF). The value of an asset is the
sum of its future cash flows, discounted back to the present.
[edit]Value investing performance
[edit]Performance, value strategies
Value investing has proven to be a successful investment strategy. There are several ways to
evaluate its success. One way is to examine the performance of simple value strategies, such as
buying low PE ratio stocks, low price-to-cash-flow ratio stocks, or low price-to-book ratio
stocks. Numerous academics have published studies investigating the effects of buying value
stocks. These studies have consistently found that value stocks outperform growth stocks and the
market as a whole.[3][4][5]
[edit]Performance, value investors
Another way to examine the performance of value investing strategies is to examine the
investing performance of well-known value investors. Simply examining the performance of the
best known value investors would not be instructive, because investors do not become well
known unless they are successful. This introduces a selection bias. A better way to investigate
the performance of a group of value investors was suggested by Warren Buffett, in his May 17,
1984 speech that was published as The Superinvestors of Graham-and-Doddsville. In this
speech, Buffett examined the performance of those investors who worked at Graham-Newman
Corporation and were thus most influenced by Benjamin Graham. Buffett's conclusion is
identical to that of the academic research on simple value investing strategies--value investing is,
on average, successful in the long run.
During about a 25-year period (1965-90), published research and articles in leading journals of
the value ilk were few. Warren Buffett once commented, "You couldn't advance in a finance
department in this country unless you taught that the world was flat."[6]
[edit]Well known value investors
Benjamin Graham is regarded by many to be the father of value investing. Along with David
Dodd, he wrote Security Analysis, first published in 1934. The most lasting contribution of this
book to the field of security analysis was to emphasize the quantifiable aspects of security
analysis (such as the evaluations of earnings and book value) while minimizing the importance
of more qualitative factors such as the quality of a company's management. Graham later wrote
The Intelligent Investor, a book that brought value investing to individual investors. Aside from
Buffett, many of Graham's other students, such as William J. Ruane, Irving Kahn and Charles
Brandes have gone on to become successful investors in their own right.
Graham's most famous student, however, is Warren Buffett, who ran successful investing
partnerships before closing them in 1969 to focus on running Berkshire Hathaway. Charlie
Munger joined Buffett at Berkshire Hathaway in the 1970s and has since worked as Vice
Chairman of the company. Buffett has credited Munger with encouraging him to focus on long-
term sustainable growth rather than on simply the valuation of current cash flows or assets.[7]
Columbia Business School has played a significant role in shaping the principles of the Value
Investor, with Professors and students making their mark on history and on each other. Ben
Graham’s book, The Intelligent Investor, was Warren Buffett’s bible and he referred to it as "the
greatest book on investing ever written.” A young Warren Buffett studied under Prof. Ben
Graham, took his course and worked for his small investment firm, Graham Newman, from 1954
to 1956. Twenty years after Ben Graham, Prof. Roger Murray arrived and taught value investing
to a young student named Mario Gabelli. About a decade or so later, Prof. Bruce Greenwald
arrived and produced his own protégés, including Mr. Paul Sonkin - just as Ben Graham had Mr.
Buffett as a protégé, and Roger Murray had Mr. Gabelli.
Mutual Series has a well known reputation of producing top value managers and analysts in this
modern era. This tradition stems from two individuals: the late great value mind Max Heine,
founder of the well regarded value investment firm Mutual Shares fund in 1949 and his protégé
legendary value investor Michael F. Price. Mutual Series was sold to Franklin Templeton in
1996. The disciples of Heine and Price quietly practice value investing at some of the most
successful investment firms in the country.
Seth Klarman is a Mutual Series alum and the founder and president of The Baupost Group, a
Boston-based private investment partnership, authored Margin of Safety, Risk Averse Investing
Strategies for the Thoughtful Investor, which since has become a value investing classic. Now
out of print, Margin of Safety has sold on Amazon for $1,200 and eBay for $2,000.[8] Another
famous value investor is John Templeton. He first achieved investing success by buying shares
of a number of companies in the aftermath of the stock market crash of 1929.
Martin J. Whitman is another well-regarded value investor. His approach is called safe-and-
cheap, which was hitherto referred to as financial-integrity approach. Martin Whitman focuses
on acquiring common shares of companies with extremely strong financial position at a price
reflecting meaningful discount to the estimated NAV of the company concerned. Martin
Whitman believes it is ill-advised for investors to pay much attention to the trend of macro-
factors (like employment, movement of interest rate, GDP, etc.) not so much because they are
not important as because attempts to predict their movement are almost always futile. Martin
Whitman's letters to shareholders of his Third Avenue Value Fund (TAVF) are considered
valuable resources "for investors to pirate good ideas" by another famous investor Joel
Greenblatt in his book on special-situation investment "You Can Be a Stock Market Genius"
(ISBN 0-684-84007-3, pp 247)
Joel Greenblatt achieved annual returns at the hedge fund Gotham Capital of over 50% per year
for 10 years from 1985 to 1995 before closing the fund and returning his investors' money. He is
known for investing in special situations such as spin-offs, mergers, and divestitures.
Charles de Vaulx and Jean-Marie Eveillard are well known global value managers. For a time,
these two were paired up at the First Eagle Funds, compiling an enviable track record of risk-
adjusted outperformance. For example, Morningstar designated them the 2001 "International
Stock Manager of the Year" and de Vaulx earned second place from Morningstar for 2006.
Eveillard is known for his Bloomberg appearances where he insists that securities investors
never use margin or leverage. The point made is that margin should be considered the anathema
of value investing, since a negative price move could prematurely force a sale. In contrast, a
value investor must be able and willing to be patient for the rest of the market to recognize and
correct whatever pricing issue created the momentary value. Eveillard correctly labels the use of
margin or leverage as speculation, the opposite of value investing.
[edit]Criticism
An issue with buying shares in a bear market is that despite appearing undervalued at one time,
prices can still drop along with the market.[9]
An issue with not buying shares in a bull market is that despite appearing overvalued at one time,
prices can still rise along with the market.
Another issue is the method of calculating the "intrinsic value". Two investors can analyze the
same information and reach different conclusions regarding the intrinsic value of the company.
There is no systematic or standard way to valuate a stock.[10]
Modern portfolio theory (MPT) is a theory of investment which tries to maximize return and
minimize risk by carefully choosing different assets. Although MPT is widely used in practice in
the financial industry and several of its creators won a Nobel prize for the theory, in recent years
the basic assumptions of MPT have been widely challenged by fields such as behavioral
economics.
MPT is a mathematical formulation of the concept of diversification in investing, with the aim of
selecting a collection of investment assets that has collectively lower risk than any individual
asset. This is possible, in theory, because different types of assets often change in value in
opposite ways. For example, when the prices in the stock market fall, the prices in the bond
market often increase, and vice versa. A collection of both types of assets can therefore have
lower overall risk than either individually.
More technically, MPT models an asset's return as a normally distributedrandom variable,
defines risk as the standard deviation of return, and models a portfolio as a weighted combination
of assets so that the return of a portfolio is the weighted combination of the assets' returns. By
combining different assets whose returns are not correlated, MPT seeks to reduce the total
variance of the portfolio. MPT also assumes that investors are rational and markets are efficient.
MPT was developed in the 1950s through the early 1970s and was considered an important
advance in the mathematical modeling of finance. Since then, much theoretical and practical
criticism has been leveled against it. These include the fact that financial returns do not follow a
Gaussian distribution and that correlations between asset classes are not fixed but can vary
depending on external events (especially in crises). Further, there is growing evidence that
investors are not rational and markets are not efficient.
[edit]Concept
The fundamental concept behind MPT is that the assets in an investment portfolio cannot be
selected individually, each on their own merits. Rather, it is important to consider how each asset
changes in price relative to how every other asset in the portfolio changes in price.
Investing is a tradeoff between risk and return. In general, assets with higher returns are riskier.
For a given amount of risk, MPT describes how to select a portfolio with the highest possible
return. Or, for a given return, MPT explains how to select a portfolio with the lowest possible
risk (the desired return cannot be more than the highest-returning available security, of course.)[1]
MPT is therefore a form of diversification. Under certain assumptions and for specific
quantitative definitions of risk and return, MPT explains how to find the best possible
diversification strategy.
[edit]History
Harry Markowitz introduced MPT in a 1952 article and a 1959 book.[1]
[edit]Mathematical model
In some sense the mathematical derivation below is MPT, although the basic concepts behind the
model have also been very influential.[1]
This section develops the "classic" MPT model. There have been many extensions since.
[edit]Risk and return
MPT assumes that investors are risk averse, meaning that given two assets that offer the same
expected return, investors will prefer the less risky one. Thus, an investor will take on increased
risk only if compensated by higher expected returns. Conversely, an investor who wants higher
returns must accept more risk. The exact trade-off will differ by investor based on individual risk
aversion characteristics. The implication is that a rational investor will not invest in a portfolio if
a second portfolio exists with a more favorable risk-return profile – i.e., if for that level of risk an
alternative portfolio exists which has better expected returns.
MPT further assumes that the investor's risk / reward preference can be described via a
quadraticutility function. The effect of this assumption is that only the expected return and the
volatility (i.e., mean return and standard deviation) matter to the investor. The investor is
indifferent to other characteristics of the distribution of returns, such as its skew (measures the
level of asymmetry in the distribution) or kurtosis (measure of the thickness or so-called "fat
tail").
Note that the theory uses a parameter, volatility, as a proxy for risk, while return is an
expectation on the future. This is in line with the efficient market hypothesis and most of the
classical findings in finance. There are problems with this, see criticism.
Under the model:
• Portfolio return is the proportion-weighted combination of the constituent
assets' returns.
• Portfolio volatility is a function of the correlationρ of the component assets.
The change in volatility is non-linear as the weighting of the component
assets changes.
In general:
• Expected return:-
• Portfolio return:
• Portfolio variance:
[edit]Diversification
An investor can reduce portfolio risk simply by holding combinations of instruments which are
not perfectly positively correlated (correlation coefficient -1<(r)<1)). In other words, investors
can reduce their exposure to individual asset risk by holding a diversified portfolio of assets.
Diversification will allow for the same portfolio return with reduced risk.
If all the assets of a portfolio have a correlation of +1, i.e., perfect positive correlation, the
portfolio volatility (standard deviation) will be equal to the weighted sum of the individual asset
volatilities. Hence the portfolio variance will be equal to the square of the total weighted sum of
the individual asset volatilities.[2]
If all the assets have a correlation of 0, i.e., perfectly uncorrelated, the portfolio variance is the
sum of the individual asset weights squared times the individual asset variance (and the standard
deviation is the square root of this sum).
If correlation coefficient is less than zero (r=0), i.e., the assets are inversely correlated, the
portfolio variance and hence volatility will be less than if the correlation coefficient is 0.
[edit]The risk-free asset
The risk-free asset is the (hypothetical) asset which pays a risk-free rate. In practice, short-term
Government securities (such as US treasury bills) are used as a risk-free asset, because they pay
a fixed rate of interest and have exceptionally low default risk. The risk-free asset has zero
variance in returns (hence is risk-free); it is also uncorrelated with any other asset (by definition:
since its variance is zero). As a result, when it is combined with any other asset, or portfolio of
assets, the change in return and also in risk is linear.
Because both risk and return change linearly as the risk-free asset is introduced into a portfolio,
this combination will plot a straight line in risk-return space. The line starts at 100% in the risk-
free asset and weight of the risky portfolio = 0 (i.e., intercepting the return axis at the risk-free
rate) and goes through the portfolio in question where risk-free asset holding = 0 and portfolio
weight = 1.
Using the formulae for a two asset portfolio as above:
• Return is the weighted average of the risk free asset, f, and the risky
portfolio, p, and is therefore linear:
• Since the asset is risk free, portfolio standard deviation is simply a function of
the weight of the risky portfolio in the position. This relationship is linear.
[edit]Capital allocation line
The capital allocation line (CAL) is the line of expected return plotted against risk (standard
deviation) that connects all portfolios that can be formed using a risky asset and a riskless asset.
It can be proven that it is a straight line and that it has the following equation.
In this formula P is the risky portfolio, F is the riskless portfolio, and C is a combination of
portfolios P and F.
[edit]The efficient frontier
Every possible asset combination can be plotted in risk-return space, and the collection of all
such possible portfolios defines a region in this space. The line along the upper edge of this
region is known as the efficient frontier (sometimes "the Markowitz frontier"). Combinations
along this line represent portfolios (explicitly excluding the risk-free alternative) for which there
is lowest risk for a given level of return. Conversely, for a given amount of risk, the portfolio
lying on the efficient frontier represents the combination offering the best possible return.
Mathematically the Efficient Frontier is the intersection of the Set of Portfolios with Minimum
Variance (MVS) and the Set of Portfolios with Maximum Return. Formally, the efficient frontier
is the set of maximal elements with respect to the partial order of product order on risk and
return, the set of portfolios for which one cannot improve both risk and return.
The efficient frontier is illustrated above, with return μp on the y-axis, and risk σp on the x-axis;
an alternative illustration from the diagram in the CAPM article is at right.
The efficient frontier will be convex – this is because the risk-return characteristics of a portfolio
change in a non-linear fashion as its component weightings are changed. (As described above,
portfolio risk is a function of the correlation of the component assets, and thus changes in a non-
linear fashion as the weighting of component assets changes.) The efficient frontier is a parabola
(hyperbola) when expected return is plotted against variance (standard deviation).
The region above the frontier is unachievable by holding risky assets alone. No portfolios can be
constructed corresponding to the points in this region. Points below the frontier are suboptimal.
A rational investor will hold a portfolio only on the frontier.
Matrices are preferred for calculations of the efficient frontier. In matrix form, for a given "risk tolerance"
∑ w1= i
where αi is called the asset's alpha and βi the asset's beta coefficient.
[edit]Capital asset pricing model
Main article: Capital Asset Pricing Model
The asset return depends on the amount for the asset today. The price paid must ensure that the
market portfolio's risk / return characteristics improve when the asset is added to it. The CAPM
is a model which derives the theoretical required return (i.e., discount rate) for an asset in a
market, given the risk-free rate available to investors and the risk of the market as a whole. The
CAPM is usually expressed:
Risk =
Return =
rational if .
Thus:
i.e. :
i.e. :
[edit]Criticism
Despite its theoretical importance, some people question whether MPT is an ideal investing
strategy, because its model of financial markets does not match the real world in many ways.
[edit]Assumptions
The mathematical framework of MPT makes many assumptions about investors and markets.
Some are explicit in the equations, such as the use of Normal distributions to model returns.
Others are implicit, such as the neglect of taxes and transaction fees. None of these assumptions
are entirely true, and each of them compromises MPT to some degree.
• Asset returns are (jointly) normally distributedrandom variables. In
fact, it is frequently observed that returns in equity and other markets are not
normally distributed. Large swings (3 to 6 standard deviations from the
mean) occur in the market far more frequently than the normal distribution
assumption would predict. [3]
• Correlations between assets are fixed and constant forever.
Correlations depend on systemic relationships between the underlying
assets, and change when these relationships change. Examples include one
country declaring war on another, or a general market crash. During times of
financial crisis all assets tend to become positively correlated, because they
all move (down) together. In other words, MPT breaks down precisely when
investors are most in need of protection from risk.
• All investors aim to maximize economic utility (in other words, to
make as much money as possible, regardless of any other
considerations). This is a key assumption of the efficient market
hypothesis, upon which MPT relies.
• All investors are rational and risk-averse. This is another assumption of
the efficient market hypothesis, but we now know from behavioral economics
that market participants are not rational. It does not allow for "herd behavior"
or investors who will accept lower returns for higher risk. Casino gamblers
clearly pay for risk, and it is possible that some stock traders will pay for risk
as well.
• All investors have access to the same information at the same time.
This also comes from the efficient market hypothesis. In fact, real markets
contain information asymmetry, insider trading, and those who are simply
better informed than others.
• Investors have an accurate conception of possible returns, i.e., the
probability beliefs of investors match the true distribution of
returns. A different possibility is that investors' expectations are biased,
causing market prices to be informationally inefficient. This possibility is
studied in the field of behavioral finance, which uses psychological
assumptions to provide alternatives to the CAPM such as the overconfidence-
based asset pricing model of Kent Daniel, David Hirshleifer, and Avanidhar
Subrahmanyam (2001)[4].
• There are no taxes or transaction costs. Real financial products are
subject both to taxes and transaction costs (such as broker fees), and taking
these into account will alter the composition of the optimum portfolio. These
assumptions can be relaxed with more complicated versions of the model.
[citation needed]
• All investors are price takers, i.e., their actions do not influence
prices. In reality, sufficiently large sales or purchases of individual assets can
shift market prices for that asset and others (via cross-elasticity of demand.)
An investor may not even be able to assemble the theoretically optimal
portfolio if the market moves too much while they are buying the required
securities.
• Any investor can lend and borrow an unlimited amount at the risk
free rate of interest. In reality, every investor has a credit limit.
• All securities can be divided into parcels of any size. In reality,
fractional shares usually cannot be bought or sold, and some assets have
minimum orders sizes.
More complex versions of MPT can take into account a more sophisticated model of the world
(such as one with non-normal distributions and taxes) but all mathematical models of finance
still rely on many unrealistic premises.
[edit]MPT does not really model the market
The risk, return, and correlation measures used by MPT are expected values, which means that
they are mathematical statements about the future (the expected value of returns is explicit in the
above equations, and implicit in the definitions of variance and covariance.) In practice investors
must substitute predictions based on historical measurements of asset return and volatility for
these values in the equations. Very often such predictions are wrong, as captured in the classic
disclaimer "past performance is not necessarily indicative of future results."
More fundamentally, investors are stuck with estimating key parameters from past market data
because MPT attempts to model risk in terms of the likelihood of losses, but says nothing about
why those losses might occur. The risk measurements used are probabilistic in nature, not
structural. This is a major difference as compared to many engineering approaches to risk
management.
Options theory and MPT have at least one important conceptual difference from the probabilistic risk
assessment done by nuclear power [plants]. A PRA is what economists would call a structural model. The
components of a system and their relationships are modeled in Monte Carlo simulations. If valve X fails,
it causes a loss of back pressure on pump Y, causing a drop in flow to vessel Z, and so on.
But in the Black-Scholes equation and MPT, there is no attempt to explain an underlying structure to
price changes. Various outcomes are simply given probabilities. And, unlike the PRA, if there is no
history of a particular system-level event like a liquidity crisis, there is no way to compute the odds of it.
If nuclear engineers ran risk management this way, they would never be able to compute the odds of a
meltdown at a particular plant until several similar events occurred in the same reactor design.
—Douglas W. Hubbard, 'The Failure of Risk Management', p. 67, John Wiley & Sons, 2009. ISBN
978-0-470-38795-5
Essentially, the mathematics of MPT view the markets as a collection of dice. By examining past
market data we can develop hypotheses about how the dice are weighted, but this isn't helpful if
the markets are actually dependent upon a much bigger and more complicated chaotic system --
the world. For this reason, accurate structural models of real financial markets are unlikely to be
forthcoming because they would essentially be structural models of the entire world. Nonetheless
there is growing awareness of the concept of systemic risk in financial markets, which should
lead to more sophisticated market models.
[edit]Variance is not a good measure of risk
Mathematical risk measurements are also useful only to the degree that they reflect investor's
true concerns -- there is no point minimizing a variable that nobody cares about in practice. MPT
uses the mathematical concept of variance to quantify risk, and this might be justified under the
assumption of normally distributed returns, but for general return distributions other risk
measures (like coherent risk measures) might better reflect investor's true preferences.
In particular, variance is a symmetric measure that counts abnormally high returns as just as
risky as abnormally low returns. In reality, investors are only concerned about losses, which
shows that our intuitive concept of risk is fundamentally asymmetric in nature.
[edit]"Optimal" doesn't necessarily mean "most profitable"
MPT does not account for the social, environmental, strategic, or personal dimensions of
investment decisions. It only attempts to maximize returns, without regard to other
consequences. In a narrow sense, its complete reliance on asset prices makes it vulnerable to all
the standard market failures such as those arising from information asymmetry, externalities, and
public goods. It also rewards corporate fraud and dishonest accounting. More broadly, a firm
may have strategic or social goals that shape its investment decisions, and an individual investor
might have personal goals. In either case, information other than historical returns is relevant.
See also socially-responsible investing, fundamental analysis.
[edit]Extensions
Since MPT's introduction in 1952, many attempts have been made to improve the model,
especially by using more realistic assumptions.
Post-modern portfolio theory extends MPT by adopting non-normally distributed, asymmetric
measures of risk. This helps with some of these problems, but not others.
[edit]Other Applications
[edit]Applications to project portfolios and other "non-financial" assets
Some experts apply MPT to portfolios of projects and other assets besides financial instruments.
[5]
When MPT is applied outside of traditional financial portfolios, some differences between the
different types of portfolios must be considered.
1. The assets in financial portfolios are, for practical purposes, continuously
divisible while portfolios of projects like new software development are
"lumpy". For example, while we can compute that the optimal portfolio
position for 3 stocks is, say, 44%, 35%, 21%, the optimal position for an IT
portfolio may not allow us to simply change the amount spent on a project. IT
projects might be all or nothing or, at least, have logical units that cannot be
separated. A portfolio optimization method would have to take the discrete
nature of some IT projects into account.
2. The assets of financial portfolios are liquid can be assessed or re-assessed at
any point in time while opportunities for new projects may be limited and
may appear in limited windows of time and projects that have already been
initiated cannot be abandoned without the loss of the sunk costs (i.e., there is
little or no recovery/salvage value of a half-complete IT project).
Neither of these necessarily eliminate the possibility of using MPT and such portfolios. They
simply indicate the need to run the optimization with an additional set of mathematically-
expressed constraints that would not normally apply to financial portfolios.
Furthermore, some of the simplest elements of Modern Portfolio Theory are applicable to
virtually any kind of portfolio. The concept of capturing the risk tolerance of an investor by
documenting how much risk is acceptable for a given return could be and is applied to a variety
of decision analysis problems. MPT, however, uses historical variance as a measure of risk and
portfolios of assets like IT projects don't usually have an "historical variance" for a new piece of
software. In this case, the MPT investment boundary can be expressed in more general terms like
"chance of an ROI less than cost of capital" or "chance of losing more than half of the
investment". When risk is put in terms of uncertainty about forecasts and possible losses then the
concept is transferable to various types of investment.[5]
[edit]Application to other disciplines
In the 1970s, concepts from Modern Portfolio Theory found their way into the field of regional
science. In a series of seminal works, Michael Conroy modeled the labor force in the economy
using portfolio-theoretic methods to examine growth and variability in the labor force. This was
followed by a long literature on the relationship between economic growth and volatility.[6]
More recently, modern portfolio theory has been used to model the self-concept in social
psychology. When the self attributes comprising the self-concept constitute a well-diversified
portfolio, then psychological outcomes at the level of the individual such as mood and self-
esteem should be more stable than when the self-concept is undiversified. This prediction has
been confirmed in studies involving human subjects.[7]
Recently, modern portfolio theory has been applied to modelling the uncertainty and correlation
between documents in information retrieval. Given a query, the aim is to maximize the overall
relevance of a ranked list of documents and at the same time minimize the overall uncertainty of
the ranked list [1].
[edit]Comparison with arbitrage pricing theory
The SML and CAPM are often contrasted with the arbitrage pricing theory (APT), which holds
that the expected return of a financial asset can be modeled as a linear function of various macro-
economic factors, where sensitivity to changes in each factor is represented by a factor specific
beta coefficient.
The APT is less restrictive in its assumptions: it allows for an explanatory (as opposed to
statistical) model of asset returns, and assumes that each investor will hold a unique portfolio
with its own particular array of betas, as opposed to the identical "market portfolio". Unlike the
CAPM, the APT, however, does not itself reveal the identity of its priced factors - the number
and nature of these factors is likely to change over time and between economies.
Hyperbolic discounting
From Wikipedia, the free encyclopedia
Given two similar rewards, humans show a preference for one that arrives sooner rather than
later. Humans are said to discount the value of the later reward, by a factor that increases with
the length of the delay. In behavioral economics, hyperbolic discounting is a particular
mathematical model thought to approximate this discounting process; that is, it models how
humans actually make such valuations. Hyperbolic discounting is sharply different in form from
exponential discounting, a rational function used in finance used in the analysis of choice over
time. Hyperbolic discounting has been observed in humans and animals.
In hyperbolic discounting, valuations fall very rapidly for small delay periods, but then fall
slowly for longer delay periods. This contrasts with exponential discounting, in which valuation
falls by a constant factor per unit delay, regardless of the total length of the delay. The standard
experiment used to reveal a test subject's discounting curve is to ask: "Would you prefer A today
or B tomorrow?" and then, "Would you prefer A in one year, or B in one year and one day?"
For example in studies of pigeons [1] the pigeon is given two buttons: button A provides a small
amount of food quickly while button B provides more seed but after a delay. The bird then
experiments for a while and settles on preferring A or B. With humans the typical experiment
might ask: 'Would you prefer a dollar today or three dollars (today vs. tomorrow) or (in one year
vs. in one year and one day)?" Typically, subjects will take less money today versus tomorrow,
but will gladly wait one extra day in a year in order to receive more money.[citation needed]
Subjects using hyperbolic discounting reveal a strong tendency to make choices that are
inconsistent over time. In other words, they make choices today that their future self would
prefer not to make, despite using the same reasoning. This dynamic inconsistency [2] happens
because hyperbolic discounts value future rewards much more than exponential discounting.
[edit]Observations
The phenomenon of hyperbolic discounting is implicit in Richard Herrnstein's "matching law,"
the discovery that most subjects allocate their time or effort between two non-exclusive, ongoing
sources of reward (concurrent variable interval schedules) in direct proportion to the rate and size
of rewards from the two sources, and in inverse proportion to their delays. That is, subjects'
choices "match" these parameters.
After the report of this effect in the case of delay (Chung and Herrnstein, 1967), George Ainslie
pointed out that in a single choice between a larger, later and a smaller, sooner reward, inverse
proportionality to delay would be described by a plot of value by delay that had a hyperbolic
shape, and that this shape should produce a reversal of preference from the larger, later to the
smaller, sooner reward for no other reason but that the delays to the two rewards got shorter. He
demonstrated the predicted reversal in pigeons[vague] (Ainslie, 1974).
A large number of subsequent experiments have confirmed that spontaneous preferences by both
human and nonhuman subjects follow a hyperbolic curve rather than the conventional,
"exponential" curve that would produce consistent choice over time (Green et al., 1994; Kirby,
1997). For instance, when offered the choice between $50 now and $100 a year from now, many
people will choose the immediate $50. However, given the choice between $50 in five years or
$100 in six years almost everyone will choose $100 in six years, even though that is the same
choice seen at five years' greater distance.
Hyperbolic discounting has also been found to relate to real-world examples of self control.
Indeed, a variety of studies have used measures of hyperbolic discounting to find that drug-
dependent individuals discount delayed consequences more than matched nondependent
controls, suggesting that extreme delay discounting is a fundamental behavioral process in drug
dependence (e.g., Bickel & Johnson, 2003; Madden et al., 1997; Vuchinich & Simpson, 1998).
Some evidence suggests pathological gamblers also discount delayed outcomes at higher rates
than matched controls (e.g., Petry & Casarella, 1999). Whether high rates of hyperbolic
discounting precede addictions or vice-versa is currently unknown, although some studies have
reported that high-rate discounting rats are more likely to consume alcohol (e.g., Poulos et al.,
1995) and cocaine (Perry et al., 2005) than lower-rate discounters. Likewise, some have
suggested that high-rate hyperbolic discounting makes unpredictable (gambling) outcomes more
satisfying (Madden et al., 2007).
The degree of discounting is vitally important in describing hyperbolic discounting, especially in
the discounting of specific rewards such as money. The discounting of monetary rewards varies
across age groups due to the varying discount rate. (Green, Frye, and Myerson, 1994). The rate
depends on a variety of factors, including the species being observed, age, experience, and the
amount of time needed to consume the reward (Lowenstein and Prelec, 1992; Raineri and
Rachlin, 1993).
[edit]Mathematical model
where f(D) is the discount factor that multiplies the value of the reward, D is the delay in the
reward, and k is a parameter governing the degree of discounting. This is compared with the
formula for exponential discounting:
fE(D) = e− kD
[edit]Quasi-hyperbolic approximation
The "quasi-hyperbolic" discount function, which approximates the hyperbolic discount function
above, is given (in discrete time) by
fQH(0) = 1, and fQH(D) = β * δD,
where β and δ are constants between 0 and 1; and again D is the delay in the reward, and f(D)
is the discount factor. The condition f(0) = 1 is stating that rewards taken at the present time are
not discounted.
Quasi-hyperbolic time preferences are also referred to as "present-biased" or "beta-delta"
preferences. They retain much of the analytical tractability of exponential discounting while
capturing the key qualitative feature of discounting with true hyperbolas.
[edit]Explanations
[edit]Uncertain risks
Notice that whether discounting future gains is rational or not – and at what rate such gains
should be discounted – depends greatly on circumstances. Many examples exist in the financial
world, for example, where it is reasonable to assume that there is an implicit risk that the reward
will not be available at the future date, and furthermore that this risk increases with time.
Consider: Paying $50 for your dinner today or delaying payment for sixty years but paying
$100,000. In this case the restaurateur would be reasonable to discount the promised future value
as there is significant risk that it might not be paid (possibly due to your death, his death, etc).
Uncertainty of this type can be quantified with Bayesian analysis. [3] For example, suppose that
the probability for the reward to be available after time t is, for known hazard rate λ
P(Rt | λ) = exp( − λt)
but the rate is unknown to the decision maker. If the prior probability distribution of λ is
p(λ) = exp( − λ / k) / k
then, the decision maker will expect that the probability of the reward after time t is
which is exactly the hyperbolic discount rate. Similar conclusions can be obtained from other
plausible distributions for λ. [3]
[edit]Applications
More recently these observations about discount functions have been used to study saving for
retirement, borrowing on credit cards, and procrastination. However, hyperbolic discounting has
been most frequently used to explain addiction.
Market anomaly
From Wikipedia, the free encyclopedia
Jump to: navigation, search
A market anomaly (or inefficiency) is a price and/or return distortion on a financial market.
It is usually related to:
• either structural factors (unfair competition, lack of market transparency, ...)
• or behavioral biases by economic agents (see behavioral economics)
It sometimes refers to phenomena contradicting the efficient market hypothesis. There are
anomalies in relation to the economic fundamentals of the equity, technical trading rules, and
economic calendar events.
Anomalies could be Fundamental, Technical or calendar related. Fundamental anomalies include
value effect and small-cap effect (low P/E stocks and small cap companies do better than index
on an average. Calendar anomalies involve patterns in stock returns from year to year or month
to month, while technical anomalies include momentum effect. Some further information is
available at [1]
See also efficient market
Transparency (market)
From Wikipedia, the free encyclopedia
In economics, a market is transparent if much is known by many about:
• What products, services or capital assets are available.
• What price.
• Where.
There are two types of price transparency: 1) I know what price will be charged to me, and 2) I
know what price will be charged to you. The two types of price transparency have different
implications for differential pricing.[1]
This is a special case of the topic at transparency (humanities).
A high degree of market transparency can result in disintermediation due to the buyer's
increased knowledge of supply pricing.
Transparency is important since it is one of the theoretical conditions required for a free market
to be efficient.
Price transparency can, however, lead to higher prices, if it makes sellers reluctant to give steep
discounts to certain buyers, or if it facilitates collusion.
In financeNoisy Market Hypothesis contrasts the efficient market hypothesis in that it claims
that the prices of securities are not always the best estimate of the true underlying value of the
firm. It argues that prices can be influenced by speculators and momentum traders, as well as by
insiders and institutions that often buy and sell stocks for reasons unrelated to fundamental value,
such as for diversification, liquidity and taxes. These temporary shocks referred to as "noise" can
obscure the true value of securities and may result in mispricing of these securities for many
years. [1]
[edit]PAM
The Policy Analysis Market (PAM), while technically a futures market, was described as
utilizing the Dumb Agent Theory[4]. The main difference, as argued by James Surowiecki, is that
in a futures market the current stock prices are known in advance, while in order for the Dumb
Agent Theory to work, they should be unknown to the investor prior to the decision making
period (which is possible only with a prediction market).
[edit]Prediction Markets
For the Dumb Agent Theory to hold, investors should not know what other investors are doing
prior to making their decision[3]. While this is technically impossible in a futures exchange
(because what other people are deciding dictates the price of the security), it can be done in a
Prediction Market. Certain prediction markets are set up in this manner (Such as Predictify,
although it allows for participants to change their answers after their initial prediction).
Bid-offer spread
From Wikipedia, the free encyclopedia
(Redirected from Bid/offer spread)
The bid/offer spread (also known as bid/ask or buy/sell spread) for securities (such as stock,
futures contracts, options, or currency pairs) is the difference between the price quoted by a
market maker for an immediate sale (bid) and an immediate purchase (ask). The size of the bid-
offer spread in a given commodity is a measure of the liquidity of the market and the size of the
transaction cost. [1]
The trader initiating the transaction is said to demand liquidity, and the other party (counterparty)
to the transaction supplies liquidity. Liquidity demanders place market orders and liquidity
suppliers place limit orders. For a round trip (a purchase and sale together) the liquidity
demander pays the spread and the liquidity supplier earns the spread. All limit orders outstanding
at a given time (i.e., limit orders that have not been executed) are together called the Limit Order
Book. In some markets such as NASDAQ, dealers supply liquidity. However, on most
exchanges, such as the Australian Securities Exchange, there are no designated liquidity
suppliers, and liquidity is supplied by other traders. On these exchanges, and even on NASDAQ,
institutions and individuals can supply liquidity by placing limit orders.
The bid-ask spread is an accepted measure of liquidity costs in exchange traded securities and
commodities. On any standardized exchange two elements comprise almost all of the transaction
cost – brokerage fees and bid-ask spreads. Under competitive conditions the bid-ask spread
measures the cost of making transactions without delay. The difference in price paid by an urgent
buyer and received by an urgent seller is the liquidity cost. Since brokerage commissions do not
vary with the time taken to complete a transaction, differences in bid-ask spread indicate
differences in the liquidity cost [2].
[edit]Example: Currency spread
If the current bid price for the EUR/USD currency pair is 1.5760 and the current ask price is
1.5763. This means that currently you can sell the EUR/USD at 1.5760 and buy at 1.5763. The
difference between those prices is the spread. If the USD/JPY currency pair is currently trading
at 101.89/92, that is another way of saying that the bid for the USD/JPY 101.89 and the ask is
101.92. This means that holders of JPY can currently sell JPY for 1 US dollar at 101.89 and
investors who wish to buy JPY can do so at 101.92 per US dollar.[3]
[edit]Example: Stock spread
On United Statesstock exchanges, the minimum spread (also known as the tick size) for many
shares was 12.5 cents (one-eighth of a dollar) until 2001, when the exchanges converted from
fractional to decimal pricing, enabling spreads as small as one cent. The change was mandated
by the U.S. Securities and Exchange Commission in order to provide a fairer market for the
individual investor.
Market depth
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In finance, market depth is the size of an order needed to move the market a given amount. If
the market is deep, a large order is needed to change the price. Market depth closely relates to the
notion of liquidity, the ease to find a trading partner for a given order: a deep market is also a
liquid market.
Factors influencing market depth include:
• Ticksize. This refers to the minimum price increment at which trades may be
made on the market. The major stock markets in the United States went
through a process of decimalisation in April 2001. This switched the minimum
increment from a sixteenth to a one hundredth of a dollar. This decision
improved market depth.[1]
• Price movement restrictions. Most major financial markets do not allow
completely free exchange of the products they trade, but instead restrict
price movement in well-intentioned ways. These include session price change
limits on major commodity markets and program trading curbs on the NYSE,
which disallow certain large basket trades after the Dow Jones Industrial
Average has moved up or down 200 points in a session.
• Trading restrictions. These include futures contract and optionsposition
limits as well as the widely used uptick rule for US stocks. These prevent
market participants from adding to depth when they might otherwise choose
to do so.
• Allowable leverage. Major markets and governing bodies typically set
minimum margin requirements for trading various products. While this may
act to stabilize the marketplace, it decreases the market depth simply
because participants otherwise willing to take on very high leverage cannot
do so without providing more capital.
• Market transparency. While the latest bid or ask price is usually available
for most participants, additional information about the size of these offers
and pending bids or offers that are not the best are sometimes hidden for
reasons of technical complexity or simplicity. This decrease in available
information can affect the willingness of participants to add to market depth.
In some cases, the term refers to financial data feeds available from exchanges or brokers. An
example would be NASDAQ Level II quote data.
Slippage (finance)
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about the financial concept. For other uses, see Slippage.
With regards to futures contracts as well as other financial instruments, slippage is the difference
between estimated transaction costs and the amount actually paid. Brokers may not always be
effective enough at executing orders. Market-impacted, liquidity, and frictional costs may also
contribute. Algorithmic trading is often used to reduce slippage.
[edit]Measurement
Using initial mid price
Taleb (1997) defines slippage as the difference between the average execution price and the
initial midpoint of the bid and the offer for a given quantity to be executed.
Speculation
In statistical terms, I figure I have traded about 2 million contracts, with an average profit of $70
per contract (after slippage of perhaps $20). This average is approximately 700 standard
deviations away from randomness. [1]
[edit]Reverse Slippage
Reverse slippage as described by Taleb occurs when the purchase of a large position is done at
increasing prices, so that the mark to market value of the position increases. The danger occurs
when the trader attempts to exit his position. If the trader manages to create a squeeze large
enough then this phenomenon can be profitable.
[edit]Leveraged portfolio
A portfolio of securities that is leveraged with borrowed funds will encounter the slippage that
comes with how the portfolio increase/decrease multiply (see leverage (finance)).
• Nassim Taleb (1997). Dynamic Hedging managing vanilla and exotic options.
John Wiley & Sons. ISBN 978-0471152804.
• John L. Knight, Stephen Satchell (2003). Forecasting Volatility in the Financial
Markets. Butterworth-Heinemann. ISBN 978-0750655156.
Value at risk
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In financial mathematics and financial risk management, Value at Risk (VaR) is a widely used
risk measure of the risk of loss on a specific portfolio of financial assets. For a given portfolio,
probability and time horizon, VaR is defined as a threshold value such that the probability that
the mark-to-market loss on the portfolio over the given time horizon exceeds this value
(assuming normal markets and no trading in the portfolio) is the given probability level.[1]
For example, if a portfolio of stocks has a one-day 5% VaR of $1 million, there is a 5%
probability that the portfolio will fall in value by more than $1 million over a one day period,
assuming markets are normal and there is no trading. Informally, a loss of $1 million or more on
this portfolio is expected on 1 day in 20. A loss which exceeds the VaR threshold is termed a
“VaR break.”[2]
The 10% Value at Risk of a normally distributed portfolio returns
VaR has five main uses in finance: risk management, risk measurement, financial control,
financial reporting and computing regulatory capital. VaR is sometimes used in non-financial
applications as well.[3]
Important related ideas are economic capital, backtesting, stress testing and expected shortfall.[4]
[edit]Details
Common parameters for VaR are 1% and 5% probabilities and one day and two week horizons,
although other combinations are in use.[5]
The reason for assuming normal markets and no trading, and to restricting loss to things
measured in daily accounts, is to make the loss observable. In some extreme financial events it
can be impossible to determine losses, either because market prices are unavailable or because
the loss-bearing institution breaks up. Some longer-term consequences of disasters, such as
lawsuits, loss of market confidence and employee morale and impairment of brand names can
take a long time to play out, and may be hard to allocate among specific prior decisions. VaR
marks the boundary between normal days and extreme events. Institutions can lose far more than
the VaR amount; all that can be said is that they will not do so very often.[6]
The probability level is about equally often specified as one minus the probability of a VaR
break, so that the VaR in the example above would be called a one-day 95% VaR instead of one-
day 5% VaR. This generally does not lead to confusion because the probability of VaR breaks is
almost always small, certainly less than 0.5.[1]
Although it virtually always represents a loss, VaR is conventionally reported as a positive
number. A negative VaR would imply the portfolio has a high probability of making a profit, for
example a one-day 5% VaR of negative $1 million implies the portfolio has a 95% chance of
making more than $1 million over the next day. [7]
[edit]Varieties of VaR
The definition of VaR is nonconstructive, it specifies a property VaR must have, but not how to
compute VaR. Moreover, there is wide scope for interpretation in the definition.[8] This has led to
two broad types of VaR, one used primarily in risk management and the other primarily for risk
measurement. The distinction is not sharp, however, and hybrid versions are typically used in
financial control, financial reporting and computing regulatory capital. [9]
To a risk manager, VaR is a system, not a number. The system is run periodically (usually daily)
and the published number is compared to the computed price movement in opening positions
over the time horizon. There is never any subsequent adjustment to the published VaR, and there
is no distinction between VaR breaks caused by input errors (including Information Technology
breakdowns, fraud and rogue trading), computation errors (including failure to produce a VaR on
time) and market movements.[10]
A frequentist claim is made, that the long-term frequency of VaR breaks will equal the specified
probability, within the limits of sampling error, and that the VaR breaks will be independent in
time and independent of the level of VaR. This claim is validated by a backtest, a comparison of
published VaRs to actual price movements. In this interpretation, many different systems could
produce VaRs with equally good backtests, but wide disagreements on daily VaR values.[1]
For risk measurement a number is needed, not a system. A Bayesian probability claim is made,
that given the information and beliefs at the time, the subjective probability of a VaR break was
the specified level. VaR is adjusted after the fact to correct errors in inputs and computation, but
not to incorporate information unavailable at the time of computation.[7] In this context,
“backtest” has a different meaning. Rather than comparing published VaRs to actual market
movements over the period of time the system has been in operation, VaR is retroactively
computed on scrubbed data over as long a period as data are available and deemed relevant. The
same position data and pricing models are used for computing the VaR as determining the price
movements.[2]
Although some of the sources listed here treat only one kind of VaR as legitimate, most of the
recent ones seem to agree that risk management VaR is superior for making short-term and
tactical decisions today, while risk measurement VaR should be used for understanding the past,
and making medium term and strategic decisions for the future. When VaR is used for financial
control or financial reporting it should incorporate elements of both. For example, if a trading
desk is held to a VaR limit, that is both a risk-management rule for deciding what risks to allow
today, and an input into the risk measurement computation of the desk’s risk-adjusted return at
the end of the reporting period.[4]
[edit]VAR in Governance
An interesting takeoff on VaR is its application in Governance for endowments, trusts, and
pension plans. Essentially trustees adopt portfolio Values-at-Risk metrics for the entire pooled
account and the diversified parts individually managed. Instead of probability estimates they
simply define maximum levels of acceptable loss for each. Doing so provides an easy metric for
oversight and adds accountability as managers are then directed to manage, but with the
additional constraint to avoid losses within a defined risk parameter. VAR utilized in this manner
adds relevance as well as an easy to monitor risk measurement control far more intuitive than
Standard Deviation of Return. Use of VAR in this context, as well as a worthwhile critque on
board governance practices as it relates to investment management oversight in general can be
found in 'Best Practices in Governance".[11]
[edit]Risk measure and risk metric
The term “VaR” is used both for a risk measure and a risk metric. This sometimes leads to
confusion. Sources earlier than 1995 usually emphasize the risk measure, later sources are more
likely to emphasize the metric.
The VaR risk measure defines risk as mark-to-market loss on a fixed portfolio over a fixed time
horizon, assuming normal markets. There are many alternative risk measures in finance. Instead
of mark-to-market, which uses market prices to define loss, loss is often defined as change in
fundamental value. For example, if an institution holds a loan that declines in market price
because interest rates go up, but has no change in cash flows or credit quality, some systems do
not recognize a loss. Or we could try to incorporate the economic cost of things not measured in
daily financial statements, such as loss of market confidence or employee morale, impairment of
brand names or lawsuits.[4]
Rather than assuming a fixed portfolio over a fixed time horizon, some risk measures incorporate
the effect of expected trading (such as a stop loss order) and consider the expected holding
period of positions. Finally, some risk measures adjust for the possible effects of abnormal
markets, rather than excluding them from the computation.[4]
The VaR risk metric summarizes the distribution of possible losses by a quantile, a point with a
specified probability of greater losses. Common alternative metrics are standard deviation, mean
absolute deviation, expected shortfall and downside risk.[1]
[edit]VaR risk management
Supporters of VaR-based risk management claim the first and possibly greatest benefit of VaR is
the improvement in systems and modeling it forces on an institution. In 1997, Philippe Jorion
wrote:[12]
[T]he greatest benefit of VAR lies in the imposition of a structured methodology for critically thinking
about risk. Institutions that go through the process of computing their VAR are forced to confront their
exposure to financial risks and to set up a proper risk management function. Thus the process of getting to
VAR may be as important as the number itself.
Publishing a daily number, on-time and with specified statistical properties holds every part of a
trading organization to a high objective standard. Robust backup systems and default
assumptions must be implemented. Positions that are reported, modeled or priced incorrectly
stand out, as do data feeds that are inaccurate or late and systems that are too-frequently down.
Anything that affects profit and loss that is left out of other reports will show up either in inflated
VaR or excessive VaR breaks. “A risk-taking institution that does not compute VaR might
escape disaster, but an institution that cannot compute VaR will not.” [13]
The second claimed benefit of VaR is that it separates risk into two regimes. Inside the VaR
limit, conventional statistical methods are reliable. Relatively short-term and specific data can be
used for analysis. Probability estimates are meaningful, because there are enough data to test
them. In a sense, there is no true risk because you have a sum of many independent observations
with a left bound on the outcome. A casino doesn't worry about whether red or black will come
up on the next roulette spin. Risk managers encourage productive risk-taking in this regime,
because there is little true cost. People tend to worry too much about these risks, because they
happen frequently, and not enough about what might happen on the worst days.[14]
Outside the VaR limit, all bets are off. Risk should be analyzed with stress testing based on long-
term and broad market data.[15] Probability statements are no longer meaningful.[16] Knowing the
distribution of losses beyond the VaR point is both impossible and useless. The risk manager
should concentrate instead on making sure good plans are in place to limit the loss if possible,
and to survive the loss if not.[1]
One specific system uses three regimes.[17]
1. Out to three times VaR are normal occurrences. You expect periodic VaR
breaks. The loss distribution typically has fat tails, and you might get more
than one break in a short period of time. Moreover, markets may be
abnormal and trading may exacerbate losses, and you may take losses not
measured in daily marks such as lawsuits, loss of employee morale and
market confidence and impairment of brand names. So an institution that
can't deal with three times VaR losses as routine events probably won't
survive long enough to put a VaR system in place.
2. Three to ten times VaR is the range for stress testing. Institutions should be
confident they have examined all the foreseeable events that will cause
losses in this range, and are prepared to survive them. These events are too
rare to estimate probabilities reliably, so risk/return calculations are useless.
3. Foreseeable events should not cause losses beyond ten times VaR. If they do
they should be hedged or insured, or the business plan should be changed to
avoid them, or VaR should be increased. It's hard to run a business if
foreseeable losses are orders of magnitude larger than very large everyday
losses. It's hard to plan for these events, because they are out of scale with
daily experience. Of course there will be unforeseeable losses more than ten
times VaR, but it's pointless to anticipate them, you can't know much about
them and it results in needless worrying. Better to hope that the discipline of
preparing for all foreseeable three-to-ten times VaR losses will improve
chances for surviving the unforeseen and larger losses that inevitably occur.
"A risk manager has two jobs: make people take more risk the 99% of the time it is safe to do so,
and survive the other 1% of the time. VaR is the border."[13]
[edit]VaR risk measurement
The VaR risk measure is a popular way to aggregate risk across an institution. Individual
business units have risk measures such as duration for a fixed incomeportfolio or beta for an
equity business. These cannot be combined in a meaningful way.[1] It is also difficult to aggregate
results available at different times, such as positions marked in different time zones, or a high
frequency trading desk with a business holding relatively illiquid positions. But since every
business contributes to profit and loss in an additive fashion, and many financial businesses
mark-to-market daily, it is natural to define firm-wide risk using the distribution of possible
losses at a fixed point in the future.[4]
In risk measurement, VaR is usually reported alongside other risk metrics such as standard
deviation, expected shortfall and “greeks” (partial derivatives of portfolio value with respect to
market factors). VaR is a distribution-free metric, that is it does not depend on assumptions about
the probability distribution of future gains and losses.[13] The probability level is chosen deep
enough in the left tail of the loss distribution to be relevant for risk decisions, but not so deep as
to be difficult to estimate with accuracy.[18]
Risk measurement VaR is sometimes called parametric VaR. This usage can be confusing,
however, because it can be estimated either parametrically (for examples, variance-covariance
VaR or delta-gamma VaR) or nonparametrically (for examples, historical simulation VaR or
resampled VaR). The inverse usage makes more logical sense, because risk management VaR is
fundamentally nonparametric, but it is seldom referred to as nonparametric VaR.[4][6]
[edit]History of VaR
The problem of risk measurement is an old one in statistics, economics and finance. Financial
risk management has been a concern of regulators and financial executives for a long time as
well. Retrospective analysis has found some VaR-like concepts in this history. But VaR did not
emerge as a distinct concept until the late 1980s. The triggering event was the stock market crash
of 1987. This was the first major financial crisis in which a lot of academically-trained quants
were in high enough positions to worry about firm-wide survival.[1]
The crash was so unlikely given standard statistical models, that it called the entire basis of quant
finance into question. A reconsideration of history led some quants to decide there were
recurring crises, about one or two per decade, that overwhelmed the statistical assumptions
embedded in models used for trading, investment management and derivative pricing. These
affected many markets at once, including ones that were usually not correlated, and seldom had
discernible economic cause or warning (although after-the-fact explanations were plentiful).[16]
Much later, they were named "Black Swans" by Nassim Taleb and the concept extended far
beyond finance.[19]
If these events were included in quantitative analysis they dominated results and led to strategies
that did not work day to day. If these events were excluded, the profits made in between "Black
Swans" could be much smaller than the losses suffered in the crisis. Institutions could fail as a
result.[13][16][19]
VaR was developed as a systematic way to segregate extreme events, which are studied
qualitatively over long-term history and broad market events, from everyday price movements,
which are studied quantitatively using short-term data in specific markets. It was hoped that
"Black Swans" would be preceded by increases in estimated VaR or increased frequency of VaR
breaks, in at least some markets. The extent to which has proven to be true is controversial.[16]
Abnormal markets and trading were excluded from the VaR estimate in order to make it
observable.[14] It is not always possible to define loss if, for example, markets are closed as after
9/11, or severely illiquid, as happened several times in 2008.[13] Losses can also be hard to define
if the risk-bearing institution fails or breaks up.[14] A measure that depends on traders taking
certain actions, and avoiding other actions, can lead to self reference.[1]
This is risk management VaR. It was well-established in quantative trading groups at several
financial institutions, notably Bankers Trust, before 1990, although neither the name nor the
definition had been standardized. There was no effort to aggregate VaRs across trading desks.[16]
The financial events of the early 1990s found many firms in trouble because the same underlying
bet had been made at many places in the firm, in non-obvious ways. Since many trading desks
already computed risk management VaR, and it was the only common risk measure that could be
both defined for all businesses and aggregated without strong assumptions, it was the natural
choice for reporting firmwide risk. J. P. Morgan CEO Dennis Weatherstone famously called for
a “4:15 report” that combined all firm risk on one page, available within 15 minutes of the
market close.[8]
Risk measurement VaR was developed for this purpose. Development was most extensive at J.
P. Morgan, which published the methodology and gave free access to estimates of the necessary
underlying parameters in 1994. This was the first time VaR had been exposed beyond a
relatively small group of quants. Two years later, the methodology was spun off into an
independent for-profit business now part of RiskMetrics Group.[8]
In 1997, the U.S. Securities and Exchange Commission ruled that public corporations must
disclose quantitative information about their derivatives activity. Major banks and dealers chose
to implement the rule by including VaR information in the notes to their financial statements.[1]
Worldwide adoption of the Basel II Accord, beginning in 1999 and nearing completion today,
gave further impetus to the use of VaR. VaR is the preferred measure of market risk, and
concepts similar to VaR are used in other parts of the accord.[1]
[edit]Mathematics
"Given some confidence level the VaR of the portfolio at the confidence level α is
given by the smallest number l such that the probability that the loss L exceeds l is not larger
than (1 − α)"[3]
The left equality is a definition of VaR. The right equality assumes an underlying probability
distribution, which makes it true only for parametric VaR. Risk managers typically assume that
some fraction of the bad events will have undefined losses, either because markets are closed or
illiquid, or because the entity bearing the loss breaks apart or loses the ability to compute
accounts. Therefore, they do not accept results based on the assumption of a well-defined
probability distribution.[6]Nassim Taleb has labeled this assumption, "charlatanism."[20] On the
other hand, many academics prefer to assume a well-defined distribution, albeit usually one with
fat tails.[1] This point has probably caused more contention among VaR theorists than any other.[8]
[edit]Criticism
VaR has been controversial since it moved from trading desks into the public eye in 1994. A
famous 1997 debate between Nassim Taleb and Philippe Jorion set out some of the major points
of contention. Taleb claimed VaR:[21]
1. Ignored 2,500 years of experience in favor of untested models built by non-
traders
2. Was charlatanism because it claimed to estimate the risks of rare events,
which is impossible
3. Gave false confidence
4. Would be exploited by traders
More recently David Einhorn and Aaron Brown debated VaR in Global Association of Risk
Professionals Review[13][22] Einhorn compared VaR to “an airbag that works all the time, except
when you have a car accident.” He further charged that VaR:
1. Led to excessive risk-taking and leverage at financial institutions
2. Focused on the manageable risks near the center of the distribution and
ignored the tails
3. Created an incentive to take “excessive but remote risks”
4. Was “potentially catastrophic when its use creates a false sense of security
among senior executives and watchdogs.”
New York Times reporter Joe Nocera wrote an extensive piece Risk Mismanagement[23] on
January 4, 2009 discussing the role VaR played in the Financial crisis of 2007-2008. After
interviewing risk managers (including several of the ones cited above) the article suggests that
VaR was very useful to risk experts, but nevertheless exacerbated the crisis by giving false
security to bank executives and regulators. A powerful tool for professional risk managers, VaR
is portrayed as both easy to misunderstand, and dangerous when misunderstood.
A common complaint among academics is that VaR is not subadditive.[4] That means the VaR of
a combined portfolio can be larger than the sum of the VaRs of its components. To a practicing
risk manager this makes sense. For example, the average bank branch in the United States is
robbed about once every ten years. A single-branch bank has about 0.004% chance of being
robbed on a specific day, so the risk of robbery would not figure into one-day 1% VaR. It would
not even be within an order of magnitude of that, so it is in the range where the institution should
not worry about it, it should insure against it and take advice from insurers on precautions. The
whole point of insurance is to aggregate risks that are beyond individual VaR limits, and bring
them into a large enough portfolio to get statistical predictability. It does not pay for a one-
branch bank to have a security expert on staff.
As institutions get more branches, the risk of a robbery on a specific day rises to within an order
of magnitude of VaR. At that point it makes sense for the institution to run internal stress tests
and analyze the risk itself. It will spend less on insurance and more on in-house expertise. For a
very large banking institution, robberies are a routine daily occurrence. Losses are part of the
daily VaR calculation, and tracked statistically rather than case-by-case. A sizable in-house
security department is in charge of prevention and control, the general risk manager just tracks
the loss like any other cost of doing business.
As portfolios or institutions get larger, specific risks change from low-probability/low-
predictability/high-impact to statistically predictable losses of low individual impact. That means
they move from the range of far outside VaR, to be insured, to near outside VaR, to be analyzed
case-by-case, to inside VaR, to be treated statistically.[13]
Even VaR supporters generally agree there are common abuses of VaR:[6][8]
1. Referring to VaR as a "worst-case" or "maximum tolerable" loss. In fact, you
expect two or three losses per year that exceed one-day 1% VaR.
2. Making VaR control or VaR reduction the central concern of risk
management. It is far more important to worry about what happens when
losses exceed VaR.
3. Assuming plausible losses will be less than some multiple, often three, of
VaR. The entire point of VaR is that losses can be extremely large, and
sometimes impossible to define, once you get beyond the VaR point. To a risk
manager, VaR is the level of losses at which you stop trying to guess what
will happen next, and start preparing for anything.
4. Reporting a VaR that has not passed a backtest. Regardless of how VaR is
computed, it should have produced the correct number of breaks (within
sampling error) in the past. A common specific violation of this is to report a
VaR based on the unverified assumption that everything follows a
multivariate normal distribution.
Arbitrage pricing theory (APT), in finance, is a general theory of asset pricing, that has
become influential in the pricing of stocks.
APT holds that the expected return of a financial asset can be modeled as a linear function of
various macro-economic factors or theoretical market indices, where sensitivity to changes in
each factor is represented by a factor-specific beta coefficient. The model-derived rate of return
will then be used to price the asset correctly - the asset price should equal the expected end of
period price discounted at the rate implied by model. If the price diverges, arbitrage should bring
it back into line.
The theory was initiated by the economistStephen Ross in 1976.
[edit]The APT model
Risky asset returns are said to follow a factor structure if they can be expressed as:
where
where
That is, the expected return of an asset j is a linear function of the assets sensitivities to the n
factors.
Note that there are some assumptions and requirements that have to be fulfilled for the latter to
be correct: There must be perfect competition in the market, and the total number of factors may
never surpass the total number of assets (in order to avoid the problem of matrix singularity),
[edit]Arbitrage and the APT
Arbitrage is the practice of taking advantage of a state of imbalance between two (or possibly
more) markets and thereby making a risk-free profit; see Rational pricing.
[edit]Arbitrage in expectations
The CAPM and its extensions are based on specific assumptions on investors’ asset demand. For
example: • Investors care only about mean return and variance. • Investors hold only traded
assets.
[edit]Arbitrage mechanics
In the APT context, arbitrage consists of trading in two assets – with at least one being
mispriced. The arbitrageur sells the asset which is relatively too expensive and uses the proceeds
to buy one which is relatively too cheap.
Under the APT, an asset is mispriced if its current price diverges from the price predicted by the
model. The asset price today should equal the sum of all future cash flows discounted at the APT
rate, where the expected return of the asset is a linear function of various factors, and sensitivity
to changes in each factor is represented by a factor-specific beta coefficient.
A correctly priced asset here may be in fact a synthetic asset - a portfolio consisting of other
correctly priced assets. This portfolio has the same exposure to each of the macroeconomic
factors as the mispriced asset. The arbitrageur creates the portfolio by identifying x correctly
priced assets (one per factor plus one) and then weighting the assets such that portfolio beta per
factor is the same as for the mispriced asset.
When the investor is long the asset and short the portfolio (or vice versa) he has created a
position which has a positive expected return (the difference between asset return and portfolio
return) and which has a net-zero exposure to any macroeconomic factor and is therefore risk free
(other than for firm specific risk). The arbitrageur is thus in a position to make a risk-free profit:
Where today's price is too low:
The implication is that at the end of the period the portfolio would have
appreciated at the rate implied by the APT, whereas the mispriced asset
would have appreciated at more than this rate. The arbitrageur could
therefore:
Today:
Liquidity premium
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Liquidity premium is a term used to explain a difference between two types of financial
securities (e.g. stocks), that have all the same qualities except liquidity. For example:
Liquidity premium is a segment of a three-part theory that works to explain the behavior of
yield curves for interest rates. The upwards-curving component of the interest yield can be
explained by the liquidity premium. The reason behind this is that short term securities are less
risky compared to long term rates due to the difference in maturity dates. Therefore investors
expect a premium, or risk premium for investing in the risky security.
or
Assets that are traded on an organized market are more liquid. Financial disclosure requirements
are more stringent for quoted companies. For a given economic result, organized liquidity and
transparency make the value of quoted share higher than the market value of an unquoted share.
The difference in the prices of two assets, which are similar in all aspects except liquidity, is
called the liquidity premium.
Flight-to-quality
From Wikipedia, the free encyclopedia
(Redirected from Flight to quality)
A flight-to-quality is a stock market phenomenon occurring when investors sell what they
perceive to be higher-risk investments and purchase safer investments, such as US Treasuries,
gold or land. This is considered a sign of fear in the marketplace, as investors seek less risk in
exchange for lower profits.
This also is the increased demand for assets that are government-backed, while is also a decline
in private asset
Forward measure
In finance, a T-forward measure is a pricing measure absolutely continuous with respect to a
risk-neutral measure but rather than using the money market as numeraire, it uses a bond with
maturity T.
[edit]Mathematical definition
Let
be the discount factor in the market at time 0 for maturity T. If Q* is the risk neutral measure,
then the forward measure QT is defined via the Radon–Nikodym derivative given by
Note that this implies that the forward measure and the risk neutral measure coincide when
interest rates are deterministic. Also, this is a particular form of the change of numeraire formula
by changing the numeraire from the money market or bank account B(t) to a T-maturity bond
P(t,T). Indeed, if in general
is the price of a zero coupon bond at time t for maturity T, where is the filtration denoting
market information at time t, then we can write
from which it is indeed clear that the forward T measure is associated to the T-maturity zero
coupon bond as numeraire. For a more detailed discussion see Brigo and Mercurio (2001).
[edit]Consequences
Under the forward measure, forward prices are martingales. Compare with futures prices, which
are martingales under the risk neutral measure. Note that when interest rates are deterministic,
this implies that forward prices and futures prices are the same.
For example, the discounted stock price is a martingale under the risk-neutral measure:
by the abstract Bayes' rule. The last term is equal to unity by definition of the bond price so that
we get
The law of one price is an economic law stated as: "In an efficient market all identical goods
must have only one price."
The intuition for this law is that all sellers will flock to the highest prevailing price, and all
buyers to the lowest current market price. In an efficient market the convergence on one price is
instant.
[edit]An example: Financial markets
Commodities can be traded on financial markets, where there will be a single offer price, and bid
price. Although there is a small spread between these two values the law of one price applies (to
each). No trader will sell the commodity at a lower price than the market maker's offer-level or
buy at a higher price than the market maker's bid-level. In either case moving away from the
prevailing price would either leave no takers, or be charity.
In the derivatives market the law applies to financial instruments which appear different, but
which resolve to the same set of cash flows; see Rational pricing. Thus:
"a security must have a single price, no matter how that security is created.
For example, if an option can be created using two different sets of
underlying securities, then the total price for each would be the same or else
an arbitrage opportunity would exist." A similar argument can be used by
considering arrow securities as alluded to by Arrow and Debrue (1944).[1]
[edit]Where the law does not apply
• The law does not apply intertemporally, so prices for the same item can be
different at different times in one market. The application of the law to
financial markets in the example above is obscured by the fact that the
market maker's prices are continually moving in liquid markets. However, at
the moment each trade is executed, the law is in force (it would normally be
against exchange rules to break it).
• The law also need not apply if buyers have less than perfect information
about where to find the lowest price. In this case, sellers face a tradeoff
between the frequency and the profitability of their sales. That is, firms may
be indifferent between posting a high price (thus selling infrequently,
because most consumers will search for a lower one) and a low price (at
which they will sell more often, but earn less profit per sale).[2]
• The Balassa-Samuelson effect argues that the law of one price is not
applicable to all goods internationally, because some goods are not tradable.
It argues that the consumption may be cheaper in some countries than
others, because nontradables (especially land and labor) are cheaper in less
developed countries. This can make a typical consumption basket cheaper in
a less developed country, even if some goods in that basket have their prices
equalized by international trade.
[edit]Apparent violations
• The best-known example of an apparent violation of the law was Royal
Dutch / Shell shares. After merging in 1907, holders of Royal Dutch Petroleum
(traded in Amsterdam) and Shell Transport shares (traded in London) were
entitled to 60% and 40% respectively of all future profits. Royal Dutch shares
should therefore automatically have been priced at 50% more than Shell
shares. However, they diverged from this by up to 15%.[3] This discrepancy
disappeared with their final merger in 2005.
Rational pricing
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Rational pricing is the assumption in financial economics that asset prices (and hence asset
pricing models) will reflect the arbitrage-free price of the asset as any deviation from this price
will be "arbitraged away". This assumption is useful in pricing fixed income securities,
particularly bonds, and is fundamental to the pricing of derivative instruments.
[edit]Arbitrage mechanics
Arbitrage is the practice of taking advantage of a state of imbalance between two (or possibly
more) markets. Where this mismatch can be exploited (i.e. after transaction costs, storage costs,
transport costs, dividends etc.) the arbitrageur "locks in" a risk free profit without investing any
of his own money.
In general, arbitrage ensures that "the law of one price" will hold; arbitrage also equalises the
prices of assets with identical cash flows, and sets the price of assets with known future cash
flows.
[edit]The law of one price
The same asset must trade at the same price on all markets ("the law of one price"). Where this is
not true, the arbitrageur will:
1. buy the asset on the market where it has the lower price, and simultaneously
sell it (short) on the second market at the higher price
2. deliver the asset to the buyer and receive that higher price
3. pay the seller on the cheaper market with the proceeds and pocket the
difference.
[edit]Assets with identical cash flows
Two assets with identical cash flows must trade at the same price. Where this is not true, the
arbitrageur will:
1. sell the asset with the higher price (short sell) and simultaneously buy the
asset with the lower price
2. fund his purchase of the cheaper asset with the proceeds from the sale of the
expensive asset and pocket the difference
3. deliver on his obligations to the buyer of the expensive asset, using the cash
flows from the cheaper asset.
[edit]An asset with a known future-price
An asset with a known price in the future, must today trade at that price discounted at the risk
free rate.
Note that this condition can be viewed as an application of the above, where the two assets in
question are the asset to be delivered and the risk free asset.
(a) where the discounted future price is higher than today's price:
1. The arbitrageur agrees to deliver the asset on the future date (i.e. sells
forward) and simultaneously buys it today with borrowed money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives
the agreed price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the agreed price and the amount owed is the
arbitrage profit.
(b) where the discounted future price is lower than today's price:
1. The arbitrageur agrees to pay for the asset on the future date (i.e. buys
forward) and simultaneously sells (short) the underlying today; he invests the
proceeds.
2. On the delivery date, he cashes in the matured investment, which has
appreciated at the risk free rate.
3. He then takes delivery of the underlying and pays the agreed price using the
matured investment.
4. The difference between the maturity value and the agreed price is the
arbitrage profit.
It will be noted that (b) is only possible for those holding the asset but not needing it until the
future date. There may be few such parties if short-term demand exceeds supply, leading to
backwardation.
[edit]Fixed income securities
Rational pricing is one approach used in pricing fixed rate bonds. Here, each cash flow can be
matched by trading in (a) some multiple of a zero-coupon bond corresponding to the coupon
date, and of equivalent credit worthiness (if possible, from the same issuer as the bond being
valued) with the corresponding maturity, or (b) in a corresponding strip and ZCB.
Given that the cash flows can be replicated, the price of the bond, must today equal the sum of
each of its cash flows discounted at the same rate as each ZCB, as above. Were this not the case,
arbitrage would be possible and would bring the price back into line with the price based on
ZCBs; see Bond valuation: Arbitrage-free pricing approach
The pricing formula is as below, where each cash flow is discounted at the rate which
matches that of the coupon date:
Price =
[edit]Futures
In a futures contract, for no arbitrage to be possible, the price paid on delivery (the forward
price) must be the same as the cost (including interest) of buying and storing the asset. In other
words, the rational forward price represents the expected future value of the underlying
discounted at the risk free rate (the "asset with a known future-price", as above). Thus, for a
simple, non-dividend paying asset, the value of the future/forward, , will be found by
accumulating the present value at time to maturity by the rate of risk-free return .
This relationship may be modified for storage costs, dividends, dividend yields, and convenience
yields; see futures contract pricing.
Any deviation from this equality allows for arbitrage as follows.
• In the case where the forward price is higher:
1. The arbitrageur sells the futures contract and buys the underlying today (on
the spot market) with borrowed money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives
the agreed forward price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the two amounts is the arbitrage profit.
• In the case where the forward price is lower:
1. The arbitrageur buys the futures contract and sells the underlying today (on
the spot market); he invests the proceeds.
2. On the delivery date, he cashes in the matured investment, which has
appreciated at the risk free rate.
3. He then receives the underlying and pays the agreed forward price using the
matured investment. [If he was short the underlying, he returns it now.]
4. The difference between the two amounts is the arbitrage profit.
[edit]Options
As above, where the value of an asset in the future is known (or expected), this value can be used
to determine the asset's rational price today. In an option contract, however, exercise is
dependent on the price of the underlying, and hence payment is uncertain. Option pricing models
therefore include logic which either "locks in" or "infers" this future value; both approaches
deliver identical results. Methods which lock-in future cash flows assume arbitrage free pricing,
and those which infer expected value assume risk neutral valuation.
To do this, (in their simplest, though widely used form) both approaches assume a “Binomial
model” for the behavior of the underlying instrument, which allows for only two states - up or
down. If S is the current price, then in the next period the price will either be S up or S down.
Here, the value of the share in the up-state is S × u, and in the down-state is S × d (where u and d
are multipliers with d < 1 < u and assuming d < 1+r < u; see the binomial options model). Then,
given these two states, the "arbitrage free" approach creates a position which will have an
identical value in either state - the cash flow in one period is therefore known, and arbitrage
pricing is applicable. The risk neutral approach infers expected option value from the intrinsic
values at the later two nodes.
Although this logic appears far removed from the Black-Scholes formula and the lattice
approach in the Binomial options model, it in fact underlies both models; see The Black-Scholes
PDE. The assumption of binomial behaviour in the underlying price is defensible as the number
of time steps between today (valuation) and exercise increases, and the period per time-step is
increasingly short. The Binomial options model allows for a high number of very short time-
steps (if coded correctly), while Black-Scholes, in fact, models a continuous process.
The examples below have shares as the underlying, but may be generalised to other instruments.
The value of a put option can be derived as below, or may be found from the value of the call
using put-call parity.
[edit]Arbitrage free pricing
Here, the future payoff is "locked in" using either "delta hedging" or the "replicating portfolio"
approach. As above, this payoff is then discounted, and the result is used in the valuation of the
option today.
[edit]Delta hedging
It is possible to create a position consisting of Δcalls sold and 1 share, such that the position’s
value will be identical in the S up and S down states, and hence known with certainty (see Delta
hedging). This certain value corresponds to the forward price above ("An asset with a known
future price"), and as above, for no arbitrage to be possible, the present value of the position
must be its expected future value discounted at the risk free rate, r. The value of a call is then
found by equating the two.
1) Solve for Δ such that:
value of position in one period = S up - Δ × MAX (S up – strike price ) = S
down - Δ × MAX (S down – strike price)
then, p = [(1+r) - d ] ÷ [ u - d ]
[edit]Subsequent valuation
Once traded, swaps can also be priced using rational pricing. For example, the Floating leg of an
interest rate swap can be "decomposed" into a series of Forward rate agreements. Here, since the
swap has identical payments to the FRA, arbitrage free pricing must apply as above - i.e. the
value of this leg is equal to the value of the corresponding FRAs. Similarly, the "receive-fixed"
leg of a swap, can be valued by comparison to a Bond with the same schedule of payments.
(Relatedly, given that their underlyings have the same cash flows, bond options and swaptions
are equatable.)
[edit]Pricing shares
The Arbitrage pricing theory (APT), a general theory of asset pricing, has become influential in
the pricing of shares. APT holds that the expected return of a financial asset, can be modelled as
a linear function of various macro-economic factors, where sensitivity to changes in each factor
is represented by a factor specific beta coefficient:
where
Risk-return spectrum
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The risk-return spectrum is the relationship between the amount of return gained on an
investment and the amount of risk undertaken in that investment.[citation needed] The more return
sought, the more risk that must be undertaken.
[edit]The progression
There are various classes of possible investments, each with their own positions on the overall
risk-return spectrum. The general progression is: short-term debt; long-term debt; property; high-
yield debt; equity. There is considerable overlap of the ranges for each investment class.
All this can be visualised by plotting expected return on the vertical axis against risk (represented
by standard deviation upon that expected return) on the horizontal axis. This line starts at the
risk-free rate and rises as risk rises. The line will tend to be straight, and will be straight at
equilibrium - see discussion below on domination.
For any particular investment type, the line drawn from the risk-free rate on the vertical axis to
the risk-return point for that investment has a slope called the Sharpe ratio.
[edit]Short-term loans to good government bodies
At the lowest end is short-dated loans to government and government-guaranteed entities
(usually semi-independent government departments). The lowest of all is the risk-free rate of
return. The risk-free rate has zero risk (most modern major governments will inflate and
monetise their debts rather than default upon them), but the return is positive because there is
still both the time-preference and inflation premium components of minimum expected rates of
return that must be met or exceeded if the funding is to be forthcoming from providers. The risk-
free rate is commonly approximated by the return paid upon 30-day T-Bills or their equivalent,
but in reality that rate has more to do with the monetary policy of that country's central bank than
the market supply conditions for credit.
[edit]Mid- and long-term loans to good government bodies
The next types of investment is longer-term loans to government, such as 3-year bonds. The
range width is larger, and follows the influence of increasing risk premium required as the
maturity of that debt grows longer. Nevertheless, because it is debt of good government the
highest end of the range is still comparatively low compared to the ranges of other investment
types discussed below.
Also, if the government in question is not at the highest jurisdiction (ie, is a state or municipal
government), or the smaller that government is, the more along the risk-return spectrum that
government's securities will be.
[edit]Short term loans to blue-chip corporations
Following the lowest risk investments are short-dated bills of exchange from major blue-
chipcorporations with the highest credit ratings. The further away from perfect is the credit
rating, the more along the risk-return spectrum is that particular return.
[edit]Mid- and long-term loans to blue-chip corporations
Overlapping the range for short-term debt is the longer term debt from those same well-rated
corporations. These are higher up the range because the maturity has increased. The overlap
occurs of the mid-term debt of the best rated corporations with the short-term debt of the nearly-
but-not perfectly-rated corporations.
In this arena, the debts are called investment grade by the rating agencies. The lower the credit
rating, the higher the yield and thus the expected return.
[edit]Rental property
A commercial property that the investor rents out is comparable in risk or return to a low-
investment-grade. Industrial property has higher risk and returns, followed by residential (with
the possible exception of the investor's own home).
[edit]High-yield debt
After the returns upon all classes of investment-grade debt come the returns on speculative
gradehigh-yield debt (also known derisively as junk bonds). These may come from mid and low
rated corporations, and less politically stable governments.
[edit]Equity
Equity returns are the profits earned by businesses after interest and tax. Even the equity returns
on the highest rated corporations are notably risky. Small-cap stocks are generally riskier than
large-cap; companies that primarily service governments, or provide basic consumer goods such
as food or utilities, tend to be less volatile than those in other industries. Note that since stocks
tend to rise when corporate bonds fall and vice-versa, a portfolio containing a small percentage
of stocks can be less risky than one containing only debts.
[edit]Options and futures
Option and futures contracts often provide leverage on underlying stocks, bonds or commodities;
this increases the returns but also the risks. Note that in some cases, derivatives can be used to
hedge, decreasing the overall risk of the portfolio due to negative correlation with other
investments.
[edit]Why the progression?
The existence of risk causes the need to incur a number of expenses. For example, the more risky
the investment the more time and effort is usually required to obtain information about it and
monitor its progress. For another, the importance of a loss of X amount of value is greater than
the importance of a gain of X amount of value, so a riskier investment will attract a higher risk
premium even if the forecast return is the same as upon a less risky investment. Risk is therefore
something that must be compensated for, and the more risk the more compensation required.
If an investment had a high return with low risk, eventually everyone would want to invest there.
That action would drive down the actual rate of return achieved, until it reached the rate of return
the market deems commensurate with the level of risk. Similarly, if an investment had a low
return with high risk, all the present investors would want to leave that investment, which would
then increase the actual return until again it reached the rate of return the market deems
commensurate with the level of risk. That part of total returns which sets this appropriate level is
called the risk premium.
[edit]Leverage extends the spectrum
The use of leverage can extend the progression out even further. Examples of this include
borrowing funds to invest in equities, or use of derivatives.
If leverage is used then there are two lines instead of one. This is because although one can
invest at the risk-free rate, one can only borrow at an interest rate according to one's own credit-
rating. This is visualised by the new line starting at the point of the riskiest unleveraged
investment (equities) and rising at a lower slope than the original line. If this new line were
traced back to the vertical axis of zero risk, it will cross it at the borrowing rate.
[edit]Domination
All investment types compete against each other, even though they are on different positions on
the risk-return spectrum. Any of the mid-range investments can have their performances
simulated by a portfolio consisting of a risk-free component and the highest-risk component.
This principle, called the separation property, is a crucial feature of Modern Portfolio Theory.
The line is then called the capital market line.
If at any time there is an investment that has a higher Sharpe Ratio than another then that return
is said to dominate. When there are two or more investments above the spectrum line, then the
one with the highest Sharpe Ratio is the most dominant one, even if the risk and return on that
particular investment is lower than another. If every mid-range return falls below the spectrum
line, this means that the highest-risk investment has the highest Sharpe Ratio and so dominates
over all others.
If at any time there is an investment that dominates then funds will tend to be withdrawn from all
others and be redirected to that dominating investment. This action will lower the return on that
investment and raise it on others. The withdrawal and redirection of capital ceases when all
returns are at the levels appropriate for the degrees of risk and commensurate with the
opportunity cost arising from competition with the other investment types on the spectrum,
which means they all tend to end up having the same Sharpe Ratio.
[edit]See also
• Modern portfolio theory
• Risk
• Financial capital
• Investment
• Credit
• Interest
• Ownership equity
• Profit
• Leverage
[edit]References
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced
material may be challenged and removed. (September 2007)
Categories: Finance
Hidden categories: Articles that may contain original research from January 2008 |
All articles that may contain original research | Articles with topics of unclear
notability from January 2008 | All articles with topics of unclear notability | All
articles with unsourced statements | Articles with unsourced statements from
September 2007 | Articles lacking sources from September 2007 | All articles
lacking sources
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Linear combination
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In mathematics, linear combinations is a concept central to linear algebra and related fields of
mathematics. Most of this article deals with linear combinations in the context of a vector space
over a field, with some generalizations given at the end of the article.
Contents
• 1Definition
• 2Examples and counterexamples
○ 2.1Vectors
○ 2.2Functions
○ 2.3Polynomials
• 3The linear span
• 4Linear independence
• 5Affine, conical, and convex
combinations
• 6Operad theory
• 7Generalizations
[edit]Definition
Suppose that K is a field and V is a vector space over K. As usual, we call elements of Vvectors
and call elements of Kscalars. If v1,...,vn are vectors and a1,...,an are scalars, then the linear
combination of those vectors with those scalars as coefficients is
There is some ambiguity in the use of the term "linear combination" as to whether it refers to the
expression or to its value. In most cases the value is meant, like in the assertion "the set of all
linear combinations of v1,...,vn always forms a subspace"; however one could also say "two
different linear combinations can have the same value" in which case the expression must have
been meant. The subtle difference between these uses is the essence of the notion of linear
dependence: a family F of vectors is linearly independent precisely if any linear combination of
the vectors in F (as value) is uniquely so (as expression). In any case, even when viewed as
expressions, all that matters about a linear combination is the coefficient of each vi; trivial
modifications such as permuting the terms or adding terms with zero coefficient are not
considered to give new linear combinations.
In a given situation, K and V may be specified explicitly, or they may be obvious from context.
In that case, we often speak of a linear combination of the vectorsv1,...,vn, with the coefficients
unspecified (except that they must belong to K). Or, if S is a subset of V, we may speak of a
linear combination of vectors in S, where both the coefficients and the vectors are unspecified,
except that the vectors must belong to the set S (and the coefficients must belong to K). Finally,
we may speak simply of a linear combination, where nothing is specified (except that the vectors
must belong to V and the coefficients must belong to K); in this case one is probably referring to
the expression, since every vector in V is certainly the value of some linear combination.
Note that by definition, a linear combination involves only finitely many vectors (except as
described in Generalizations below). However, the set S that the vectors are taken from (if one
is mentioned) can still be infinite; each individual linear combination will only involve finitely
many vectors. Also, there is no reason that n cannot be zero; in that case, we declare by
convention that the result of the linear combination is the zero vector in V.
[edit]Examples and counterexamples
[edit]Vectors
Let the field K be the set R of real numbers, and let the vector space V be the Euclidean spaceR3.
Consider the vectors e1 = (1,0,0), e2 = (0,1,0) and e3 = (0,0,1). Then any vector in R3 is a linear
combination of e1, e2 and e3.
To see that this is so, take an arbitrary vector (a1,a2,a3) in R3, and write:
[edit]Functions
Let K be the set C of all complex numbers, and let V be the set CC(R) of all continuousfunctions
from the real lineR to the complex planeC. Consider the vectors (functions) f and g defined by
f(t) := eit and g(t) := e−it. (Here, e is the base of the natural logarithm, about 2.71828..., and i is the
imaginary unit, a square root of −1.) Some linear combinations of f and g are:
•
On the other hand, the constant function 3 is not a linear combination of f and g. To see this,
suppose that 3 could be written as a linear combination of eit and e−it. This means that there would
exist complex scalars a and b such that aeit + be−it = 3 for all real numbers t. Setting t = 0 and t =
π gives the equations a + b = 3 and a + b = −3, and clearly this cannot happen. See Euler's
identity.
[edit]Polynomials
Let K be R, C, or any field, and let V be the set P of all polynomials with coefficients taken from
the field K. Consider the vectors (polynomials) p1 := 1, p2 := x + 1, and p3 := x2 + x + 1.
Is the polynomial x2 − 1 a linear combination of p1, p2, and p3? To find out, consider an arbitrary
linear combination of these vectors and try to see when it equals the desired vector x2 − 1.
Picking arbitrary coefficients a1, a2, and a3, we want
Two polynomials are equal if and only if their corresponding coefficients are equal, so we can
conclude
This system of linear equations can easily be solved. First, the first equation simply says that a3
is 1. Knowing that, we can solve the second equation for a2, which comes out to −1. Finally, the
last equation tells us that a1 is also −1. Therefore, the only possible way to get a linear
combination is with these coefficients. Indeed,
However, when we set corresponding coefficients equal in this case, the equation for x3 is
which is always false. Therefore, there is no way for this to work, and x3 − 1 is not a linear
combination of p1, p2, and p3.
[edit]The linear span
Main article: linear span
Take an arbitrary field K, an arbitrary vector space V, and let v1,...,vn be vectors (in V). It’s
interesting to consider the set of all linear combinations of these vectors. This set is called the
linear span (or just span) of the vectors, say S ={v1,...,vn}. We write the span of S as span(S) or
sp(S):
[edit]Linear independence
Main article: Linear independence
For some sets of vectors v1,...,vn, a single vector can be written in two different ways as a linear
combination of them:
If that is possible, then v1,...,vn are called linearly dependent; otherwise, they are linearly
independent. Similarly, we can speak of linear dependence or independence of an arbitrary set S
of vectors.
If S is linearly independent and the span of S equals V, then S is a basis for V.
[edit]Affine, conical, and convex combinations
By restricting the coefficients used in linear combinations, one can define the related concepts of
affine combination, conical combination, and convex combination, and the associated notions of
sets closed under these operations.
Type of Restrictions on
Name of set Model space
combination coefficients
Linear Vector
no restrictions
combination subspace
Convex
Convex set Simplex
combination and
Because these are more restricted operations, more subsets will be closed under them, so affine
subsets, convex cones, and convex sets are generalizations of vector subspaces: a vector
subspace is also an affine subspace, a convex cone, and a convex set, but a convex set need not
be a vector subspace, affine, or a convex cone.
These concepts often arise when one can take certain linear combinations of objects, but not any:
for example, probability distributions are closed under convex combination (they form a convex
set), but not conical or affine combinations (or linear), and positive measures are closed under
conical combination but not affine or linear – hence one defines signed measures as the linear
closure.
Linear and affine combinations can be defined over any field (or ring), but conical and convex
combination require a notion of "positive", and hence can only be defined over an ordered field
(or ordered ring), generally the real numbers.
If one allows only scalar multiplication, not addition, one obtains a (not necessarily convex)
cone; one often restricts the definition to only allowing multiplication by positive scalars.
All of these concepts are usually defined as subsets of an ambient vector space (except for affine
spaces, which are also considered as "vector spaces forgetting the origin"), rather than being
axiomatized independently.
[edit]Operad theory
More abstractly, in the language of operad theory, one can consider vector spaces to be algebras
over the operad (the infinite direct sum, so only finitely many terms are non-zero; this
corresponds to only taking finite sums), which parametrizes linear combinations: the vector
where a1,...,an belong to KL, b1,...,bn belong to KR, and v1,...,vn belong to V.
v • d • e
Scalar ·Vector ·Vector space ·Vector projection ·Linear span ·Linear map ·Linear
projection ·Linear independence ·Linear combination ·Basis ·Column space ·Row
space ·Dual space ·Orthogonality ·Rank ·Minor ·Kernel (matrix) ·Eigenvalue,
eigenvector and eigenspace ·Least squares regressions ·Outer product ·Inner
product space ·Dot product ·Transpose ·Gram–Schmidt process ·Matrix
decomposition
Hidden categories: Articles lacking sources from August 2008 | All articles lacking
sources
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• Català
• Česky
• Deutsch
• Español
• Esperanto
• فارسی
• Français
• 한국어
• Italiano
• עברית
• Magyar
• Nederlands
• 日本語
• Polski
• Português
• Slovenčina
• Suomi
• Svenska
• Українська
• اردو
• Tiếng Việt
• 中文
Sharpe ratio
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The Sharpe ratio or Sharpe index or Sharpe measure or reward-to-variability ratio is a
measure of the excess return (or Risk Premium) per unit of risk in an investment asset or a
trading strategy, named after William Forsyth Sharpe. Since its revision by the original author in
1994, it is defined as:
where R is the asset return, Rf is the return on a benchmark asset, such as the risk free rate of
return, E[R − Rf] is the expected value of the excess of the asset return over the benchmark
return, and σ is the standard deviation of the asset.[1]
Note, if Rf is a constant risk free return throughout the period,
The Sharpe ratio is used to characterize how well the return of an asset compensates the investor
for the risk taken, the higher the Sharpe ratio number the better. When comparing two assets
each with the expected return E[R] against the same benchmark with return Rf, the asset with
the higher Sharpe ratio gives more return for the same risk. Investors are often advised to pick
investments with high Sharpe ratios. However like any mathematical model it relies on the data
being correct. Pyramid schemes with a long duration of operation would typically provide a high
Sharpe ratio when derived from reported returns but the inputs are false. When examining the
investment performance of assets with smoothing of returns (such as With profits funds) the
Sharpe ratio should be derived from the performance of the underlying assets rather than the
fund returns.
Sharpe ratios, along with Treynor ratios and Jensen's alphas, are often used to rank the
performance of portfolio or mutual fund managers.
[edit]History
This ratio was developed by William Forsyth Sharpe in 1966.[2] Sharpe originally called it the
"reward-to-variability" ratio before it began being called the Sharpe Ratio by later academics and
financial operators.
Sharpe's 1994 revision acknowledged that the risk free rate changes with time. Prior to this
revision the definition was
assuming a constant Rf .
Recently, the (original) Sharpe ratio has often been challenged with regard to its appropriateness
as a fund performance measure during evaluation periods of declining markets.[3]
[edit]Examples
Suppose the asset has an expected return of 15% in excess of the risk free rate. We typically do
not know if the asset will have this return; suppose we assess the risk of the asset, defined as
standard deviation of the asset's excess return, as 10%. The risk-free return is constant. Then the
Sharpe ratio (using a new definition) will be 1.5 (R − Rf = 0.15 and σ = 0.10).
As a guide post, one could substitute in the longer term return of the S&P500 as 10%. Assume
the risk-free return is 3.5%. And the average standard deviation of the S&P500 is about 16%.
Doing the math, we get that the average, long-term Sharpe ratio of the US market is about
0.40625 ((10%-3.5%)/16%). But we should note that if one were to calculate the ratio over, for
example, three-year rolling periods, then the Sharpe ratio could vary dramatically.
[edit]Strengths and weaknesses
The Sharpe ratio has as its principal advantage that it is directly computable from any observed
series of returns without need for additional information surrounding the source of profitability.
Other ratios such as the bias ratio have recently been introduced into the literature to handle
cases where the observed volatility may be an especially poor proxy for the risk inherent in a
time-series of observed returns.
[edit]References
1. ^Sharpe, W. F. (1994). "The Sharpe Ratio". Journal of Portfolio
Management21 (1): 49–58.
2. ^Sharpe, W. F. (1966). "Mutual Fund Performance". Journal of Business39
(S1): 119–138. doi:10.1086/294846.
3. ^Scholz, Hendrik (2007). "Refinements to the Sharpe ratio: Comparing
alternatives for bear markets". Journal of Asset Management7 (5): 347–357.
doi:10.1057/palgrave.jam.2250040.
[edit]See also
• Capital asset pricing model
• Jensen's alpha
• Modern portfolio theory
• Roy's safety-first criterion
• Sortino ratio
• Bias ratio (finance)
• Calmar ratio
• Treynor ratio
• Upside potential ratio
• Information ratio
• Coefficient of variation
[edit]External links
• The Sharpe ratio
• How sharp is the Sharpe ratio
v • d • e
Stock market
P/CF ratio ·P/E ·PEG ·P/S ratio ·P/B ratio ·D/E ratio ·Dividend
Financial payout ratio ·Dividend
ratios cover ·SGR ·ROIC ·ROCE ·ROE ·ROA ·EV/EBITDA ·RSI ·Sharpe
ratio ·Treynor ratio ·Cap rate
Trading
Efficient-market hypothesis ·Fundamental analysis ·Technical
theories
analysis ·Modern portfolio theory ·Post-modern portfolio
and
theory ·Mosaic theory ·Pairs trade
strategies
[edit]Further reading
• Bruce J. Feibel. Investment Performance Measurement. New York: Wiley,
2003. ISBN 0471268496
Retrieved from "http://en.wikipedia.org/wiki/Sharpe_ratio"
Categories: Financial ratios | Statistical ratios
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• العربية
• Català
• Deutsch
• Español
• Français
• Italiano
• עברית
• Nederlands
• Polski
• Русский
• Suomi
Market portfolio
From Wikipedia, the free encyclopedia
Jump to: navigation, search
A market portfolio is a portfolio consisting of a weighted sum of every asset in the market, with
weights in the proportions that they exist in the market (with the necessary assumption that these
assets are infinitely divisible).
Richard Roll's critique (1977) states that this is only a theoretical concept, as to create a market
portfolio for investment purposes in practice would necessarily include every single possible
available asset, including real estate, precious metals, stamp collections, jewelry, and anything
with any worth, as the theoretical market being referred to would be the world market. As a
result, proxies for the market (such as the FTSE100 in the UK, DAX in Germany or the S&P500
in the US) are used in practice by investors. Roll's critique states that these proxies cannot
provide an accurate representation of the entire market.
The concept of a market portfolio plays an important role in many financial theories and models,
including the Capital asset pricing model where it is the only fund in which investors need to
invest, to be supplemented only by a risk-free asset (depending upon each investor's attitude
towards risk).
(rj - rf) = Bj(rm - rf)
Where Rj and Fj are the the returns to security j and the risk-free rate, Rm is the return of the
Market Portfolio.
Hidden categories: Articles lacking sources from December 2009 | All articles
lacking sources
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
[edit]Introduction
The earnings response coefficient, or ERC, is the estimated relationship between equity returns
and the unexpected portion of (i.e., new information in) companies' earnings announcements.
In financial economics, arbitrage pricing theory describes the theoretical relationship between
information that is known to market participants about a particular equity (e.g., a common stock
share of a particular company) and the price of that equity. Under the efficient market
hypothesis, equity prices are expected in the aggregate to reflect all relevant information at a
given time. Market participants with superior information are expected to exploit that
information until share prices have effectively impounded the information. Therefore, in the
aggregate, a portion of changes in a company's share price is expected to result from changes in
the relevant information available to the market. The ERC is an estimate of the change in a
company's stock price due to the information provided in a company's earnings announcement.
The ERC is expressed mathematically as follows:
R = a + b(ern − u) + e
R = the expected return
a = benchmark rate
e = random movement
Price dispersion
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In economics, price dispersion is variation in prices across sellers of the same item, holding
fixed the item's characteristics. Price dispersion can be viewed as a measure of trading frictions
(or, tautologically, as a violation of the law of one price). It is often attributed to consumer search
costs or unmeasured attributes (such as the reputation) of the retailing outlets involved. There is a
difference between price dispersion and price discrimination. The latter concept involves a single
provider charging different prices to different customers for an identical good. Price dispersion,
on the other hand, is best thought of as the outcome of many firms potentially charging different
prices, where customers of one firm find it difficult to patronize (or are perhaps unaware of)
other firms due to the existence of search costs.
Price dispersion measures include the range of prices, the percentage difference of highest and
lowest price, the standard deviation of the price distribution, the variance of the price
distribution, and the coefficient of variation of the price distribution.
In most theoretical literature, price dispersion is argued as result from spatial difference and the
existence of significant search cost. With the development of internet and shopping agent
programs, conventional wisdom tells that price dispersion should be alleviated and may
eventually disappear in the online market due to the reduced search cost for both price and
product features. However, recent studies found a surprisingly high level of price dispersion
online, even for standardized items such as books, CDs and DVDs. There is some evidence of a
shrinking of this online price dispersion, but it remains significant. Recently, work has also been
done in the area of e-commerce, specifically the Semantic Web, and its effects on price
dispersion.
Hal Varian, an economist at U. C. Berkeley, argued in a 1980 article that price dispersion may be
an intentional marketing technique to encourage shoppers to explore their options.[1]
A related concept is that of wage dispersion.
[edit]See also
• Law of one price
• Search theory
[edit]References
1. ^ Varian, Hal R., "A Model of Sales" (Sep., 1980), The American Economic
Review, Vol. 70, No. 4 , pp. 651-659.
• Baye, Michael, John Morgan and Patrick Scholten, "Information, Search, and
Price Dispersion," (in Handbook on Economics and Information Systems, T.
Hendershott, Ed., Elsevier, forthcoming)
• Dahlby, Bev and Douglas West, (1986), "Price Dispersion in an Automobile
Market," Journal of Political Economy, 94(2): 418-438.
• Nash-Equilibrium.com
• Venkatesh Shankar, Xing Pan, and Brian T. Ratchford, (2002), "Do Drivers of
Online Price Dispersion Change as Online Markets Grow?," working paper,
December, University of Maryland.
• Cooper, Sean, "Why You Can't Get iPods At a Discount", Slate.
• Gupta, Tanya, and Abir Qasem,(2002), "Reduction of Price Dispersion through
Semantic E-commerce," (in Workshop at WWW2002 International Workshop
on the Semantic Web, Hawaii, May 7, 2002)
• Thiel, Stuart E., "A New Model of Persistent Retail Price Dispersion" (July 6,
2005). Available at SSRN: http://ssrn.com/abstract=757357
Retrieved from "http://en.wikipedia.org/wiki/Price_dispersion"
Categories: Pricing | Economic efficiency | Economics terminology
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• العربية
Search theory
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about the economics of search problems. For other uses of 'search',
see Searching.
In economics, search theory (or just search) is the study of an individual's optimal strategy
when choosing from a series of potential opportunities of random quality, given that delaying
choice is costly. Search models illustrate how best to balance the cost of delay against the value
of the option to try again.
Two common settings for these models (and their empirical applications) are a worker's search
for a job, in labor economics, and a consumer's search for a product they wish to purchase, in
consumer theory. From a worker's perspective, an acceptable job would be one that pays a high
wage, one that offers desirable benefits, and/or one that offers pleasant and safe working
conditions. From a consumer's perspective, a product worth purchasing would have sufficiently
high quality, and be offered at a sufficiently low price. In both cases, whether a given job or
product is acceptable depends on the searcher's beliefs about the alternatives available in the
market.
Contents
• 1Search from a known distribution
• 2Search from an unknown
distribution
• 3Endogenizing the price
distribution
• 4Matching theory
• 5See also
• 6References
More recently, especially since the 1990s, many economists have been working on integrating
job search into models of the macroeconomy, using a framework called 'matching theory'
originally developed by Dale Mortensen and extended by Peter A. Diamond and Christopher A.
Pissarides. In this framework, the rate at which new jobs are formed is assumed to depend both
on workers' search decisions, and on firms' decisions to open job vacancies. While some
matching models include a distribution of different wages,[7] others are simplified by ignoring
wage differences. The simplified versions of the model focus instead on the main reduced form
implication of search: namely, the fact that optimal job search takes time, so that workers are
likely to pass through a spell of unemployment before beginning work.[8]
The bias ratio is an indicator used in finance to analyze the returns of investment portfolios, and
in performing due diligence.
The bias ratio is a concrete metric that detects valuation bias or deliberate price manipulation of
portfolio assets by a manager of a hedge fund, mutual fund or similar investment vehicle,
without requiring disclosure (transparency) of the actual holdings. This metric measures
abnormalities in the distribution of returns that indicate subjective pricing. The formulation of
the Bias Ratio stems from an insight into the behavior of asset managers as they address the
expectations of investors with the valuation of assets that determine their performance.
The bias ratio measures how far the returns from an investment portfolio - e.g. one managed by a
hedge fund - are from an unbiased distribution. Thus the bias ratio of a pure equity index will
usually be close to 1. However, if a fund smooths its returns using subjective pricing of illiquid
assets the bias ratio will be higher. As such, it can help identify the presence of illiquid securities
where they are not expected.
The bias ratio was first defined by Adil Abdulali, a risk manager at the investment firm Protégé
Partners. The Concepts behind the Bias Ratio were formulated between 2001 and 2003 and
privately used to screen money managers. The first public discussions on the subject took place
in 2004 at New York University's Courant Institute and in 2006 at Columbia University. In 2006,
the Bias Ratio was published in a letter to Investors and made available to the public by Riskdata
a risk management solution provider that included it in its standard suite of analytics.
The Bias Ratio has since been used by a number of Risk Management professionals to spot
suspicious funds that subsequently turned out to be frauds. The most spectacular example of this
was reported in the Financial Times on January 22, 2009 titled “Bias ratio seen to unmask
Madoff”!
Contents
• 1Explanation
• 2Mathematical formulation
• 3Examples and Context
○ 3.1Natural Bias Ratios of asset returns
• 4Contrast to other metrics
○ 4.1Bias Ratios vs. Sharpe Ratios
○ 4.2Serial correlation
• 5Practical thresholds
• 6Uses and limitations
• 7See also
• 8References
[edit]Explanation
Imagine that you are a hedge fund manager who invests in securities that are hard to value, such
as mortgage backed derivatives. Your peer group consists of funds with similar mandates, and all
have track records with high Sharpe ratios, very few down months, and investor demand from
the “one per cent per month” crowd. You are keenly aware that your potential investors look
carefully at the characteristics of returns, including such calculations as the percentage of months
with negative and positive returns.
Furthermore, assume that no pricing service can reliably price your portfolio, and the assets are
often sui generis with no quoted market. In order to price the portfolio for return calculations,
you poll dealers for prices on each security monthly and get results that vary widely on each
asset. The following real world example illustrates this theoretical construct.
Table 1
When pricing this portfolio, standard market practice allows a manager to discard outliers and
average the remaining prices. But what constitutes an outlier? Market participants contend that
outliers are difficult to characterize methodically and thus use the heuristic rule “you know it
when you see it.” Visible outliers consider the particular security’s characteristics and liquidity
as well as the market environment in which quotes are solicited. After discarding outliers, a
manager sums up the relevant figures and determines the net asset value (“NAV”). Now let’s
consider what happens when this NAV calculation results in a small monthly loss, such as
-0.01%. Lo and behold, just before the CFO publishes the return, an aspiring junior analyst
notices that the pricing process included a dealer quote 50% below all the other prices for that
security. Throwing out that one quote would raise the monthly return to +0.01%.
A manager with high integrity faces two pricing alternatives. Either the manager can close the
books, report the -0.01% return, and ignore new information, ensuring the consistency of the
pricing policy (Option 1) or the manager can accept the improved data, report the +0.01% return,
and document the reasons for discarding the quote (Option 2).
Figure 1
The smooth blue histogram represents a manager who employed Option 1, and the kinked red
histogram represents a manager who chose Option 2 in those critical months. Given the
proclivity of Hedge Fund investors for consistent, positive monthly returns, many a smart
businessman might choose Option 2, resulting in more frequent small positive results and far
fewer small negative ones than in Option 1. The “reserve” that allows “false positives” with
regularity is evident in the unusual hump at the -1.5 Standard Deviation point. This psychology
is summed up in a phrase often heard on trading desks on Wall Street, “let us take the pain now!”
The geometry of this behavior in figure 1 is the area in between the blue line and the red line
from -1σ to 0.0, which has been displaced, like toothpaste squeezed from a tube, farther out into
negative territory.
By itself, such a small cover up might not concern some beyond the irritation of misstated return
volatility. However, the empirical evidence that justifies using a “Slippery Slope” argument here
includes almost every mortgage backed fund that has blown up because of valuation problems,
such as the Safe Harbor fund, and equity funds such as the Bayou fund. Both funds ended up
perpetrating outright fraud born from minor cover ups. More generally, financial history has
several well known examples where hiding small losses eventually led to fraud such as the
Sumitomo copper affair as well as the demise of Barings Bank.
[edit]Mathematical formulation
Although the hump at -σ is difficult to model, behavior induced modifications manifest
themselves in the shape of the return histogram around a small neighborhood of zero. It is
approximated by a straightforward formula.
Let: [0, +σ] = the closed interval from zero to +1 standard deviation of returns (including zero)
Let: [-σ, 0) = the half open interval from -1 standard deviation of returns to zero (including -σ
and excluding zero)
Let:
ri = return in month i, 1 ≤ i ≤ n, and n = number of monthly returns
Then:
The Bias Ratio roughly approximates the ratio between the area under the return histogram near
zero in the first quadrant and the similar area in the second quadrant. It holds the following
properties:
a.
b. If then BR = 0
The Bias Ratio defined by a 1σ interval around zero works well to discriminate amongst hedge
funds. Other intervals provide metrics with varying resolutions, but these tend towards 0 as the
interval shrinks.
[edit]Examples and Context
[edit]Natural Bias Ratios of asset returns
Table 2
The Bias Ratios of market and hedge fund indices gives some insight into the natural shape of
returns near zero. Theoretically one would not expect demand for markets with normally
distributed returns around a zero mean. Such markets have distributions with a Bias Ratio of less
than 1.0. Major market indices support this intuition and have Bias Ratios generally greater than
1.0 over long time periods. The returns of equity and fixed income markets as well as alpha
generating strategies have a natural positive skew that manifests in a smoothed return histogram
as a positive slope near zero. Fixed income strategies with a relatively constant positive return
(“carry”) also exhibit total return series with a naturally positive slope near zero. Cash
investments such as 90-day T-Bills have large Bias Ratios, because they generally do not
experience periodic negative returns. Consequently the Bias Ratio is less reliable for the theoretic
hedge fund that has an un-levered portfolio with a high cash balance.
[edit]Contrast to other metrics
[edit]Bias Ratios vs. Sharpe Ratios
Since the Sharpe Ratio measures risk-adjusted returns, and valuation biases are expected to
understate volatility, one might reasonably expect a relationship between the two. For example,
an unexpectedly high Sharpe Ratio may be a flag for skeptical practitioners to detect smoothing .
The data does not support a strong statistical relationship between a high Bias Ratio and a high
Sharpe Ratio. High Bias Ratios exist only in strategies that have traditionally exhibited high
Sharpe Ratios, but plenty of examples exist of funds in such strategies with high Bias Ratios and
low Sharpe Ratios. The prevalence of low Bias Ratio funds within all strategies further
attenuates any relationship between the two.
[edit]Serial correlation
Hedge fund investors use serial correlation to detect smoothing in hedge fund returns. Market
frictions such as transaction costs and information processing costs that cannot be arbitraged
away lead to serial correlation, as well as do stale prices for illiquid assets. Managed prices are a
more nefarious cause for serial correlation. Confronted with illiquid, hard to price assets,
managers may use some leeway to arrive at the fund’s NAV. When returns are smoothed by
marking securities conservatively in the good months and aggressively in the bad months a
manager adds serial correlation as a side effect. The more liquid the fund’s securities are, the less
leeway the manager has to make up the numbers.
The most common measure of serial correlation is the Ljung-Box Q-Statistic. The p-values of the
Q-statistic establish the significance of the serial correlation. The Bias Ratio compared to the
serial correlation metric gives different results.
Table 3
Serial correlations appear in many cases that are likely not the result of willful manipulation but
rather the result of stale prices and illiquid assets. Both Sun Asia and Plank are emerging market
hedge funds for which the author has full transparency and whose NAVs are based on objective
prices. However, both funds show significant serial correlation. The presence of serial
correlation in several market indices such as the JASDAQ and the SENSEX argues further that
serial correlation might be too blunt a tool for uncovering manipulation. However the two
admitted frauds, namely Bayou, an Equity fund and Safe Harbor, an MBS fund (Table IV shows
the critical Bias Ratio values for these strategies) are uniquely flagged by the Bias Ratio in this
sample set with none of the problems of false positives suffered by the serial correlation metric.
The Bias Ratio’s unremarkable values for market indices, adds further credence to its
effectiveness in detecting fraud.
[edit]Practical thresholds
Figure 2
Hedge fund strategy indices cannot generate benchmark Bias Ratios because aggregated monthly
returns mask individual manager behavior. All else being equal, managers face the difficult
pricing options outlined in the introductory remarks in non-synchronous periods, and their
choices should average out in aggregate. However, Bias Ratios can be calculated at the manager
level and then aggregated to create useful benchmarks.
Table 4
Strategies that employ illiquid assets can have Bias Ratios with an order of magnitude
significantly higher than the Bias Ratios of indices representing the underlying asset class. For
example, most equity indices have Bias Ratios falling between 1.0 and 1.5. A sample of equity
hedge funds may have Bias Ratios ranging from 0.3 to 3.0 with an average of 1.29 and standard
deviation of 0.5. On the other hand, the Lehman Aggregate MBS Index had a Bias Ratio of 2.16,
while MBS hedge funds may have Bias Ratios from a respectable 1.7 to an astounding 31.0, with
an average of 7.7 and standard deviation of 7.5.
[edit]Uses and limitations
Ideally, a Hedge Fund investor would examine the price of each individual underlying asset that
comprises a manager’s portfolio. With limited transparency, this ideal falls short in practice,
furthermore, even with full transparency, time constraints prohibit the plausibility of this ideal,
rendering the Bias Ratio more efficient to highlight problems. The Bias Ratio can be used to
differentiate among a universe of funds within a strategy. If a fund has a Bias Ratio above the
median level for the strategy, perhaps a closer look at the execution of its pricing policy is
warranted; whereas, well below the median might warrant only a cursory inspection.
The Bias Ratio is also useful to detect illiquid assets forensically. The table above offers some
useful benchmarks. If a database search for Long/Short Equity managers reveals a fund with a
reasonable history and a Bias Ratio greater than 2.5, detailed diligence will no doubt reveal some
fixed income or highly illiquid equity investments in the portfolio.
The Bias Ratio gives a strong indication of the presence of a) illiquid assets in a portfolio
combined with b) a subjective pricing policy. Most of the valuation-related hedge fund debacles
have exhibited high Bias Ratios. However, the converse is not always true. Often managers have
legitimate reasons for subjective pricing, including restricted securities, Private Investments in
public equities, and deeply distressed securities. Therefore, it would be unwise to use the Bias
Ratio as a stand alone due diligence tool. In many cases, the author has found that the subjective
policies causing high Bias Ratios also lead to “conservative” pricing that would receive higher
grades on a “prudent man” test than would an un-biased policy. Nevertheless, the coincidence of
historical blow-ups with high Bias Ratios encourages the diligent investor to use the tool as a
warning flag to investigate the implementation of a manager’s pricing policies.
[edit]See also
• Sharpe ratio
• Treynor ratio
• Jensen's alpha
• Sortino ratio
• Beta (finance)
• Modern portfolio theory
[edit]References
• Weinstein, Eric; Abdulali, Adil, “Hedge fund transparency: quantifying
valuation bias for illiquid assets”, June 2002, Risk.
• Abdulali, Adil; Rahl, Leslie; Weinstein, Eric, “Phantom Prices & Liquidity: The
Nuisance of Translucence”, 2002, AIMA.
• Bias Ratio: Detecting Hedge-Fund Return Smoothing
• The Madoff Case: Quantitative Beats Qualitative!
• Risk Indicator Detects When Hedge Funds Trading Illiquid Securities Are
Smoothing Returns
• Riskdata Research Shows That 30% of Funds Trading Illiquid Securities
Smooth Their Returns
• Bias ratio seen to unmask Madoff (Financial Times January 22 2009)
• Riskdata [1]
• Pension Risk Matters [2]
• Getmansky, Mila; Lo, Andrew; Makarov, Igor; “An Econometric Model of Serial
Correlation and Illiquidity In Hedge Fund Returns”, 2003, NBER Working Paper
No. w9571 Issued in March 2003.)
• Asness, Clifford S.; Krail, Robert J.; Liew, John M., “Alternative Investments:
Do Hedge Funds Hedge?”, 2001, Journal of Portfolio Management, Volume 28,
Number 1.
• SEC Litigation Release No. 18950, October 28, 2004
• SEC Litigation Release No. 19692, May 9, 2006
• Weisman, Andrew, “Dangerous Attractions: Informationless Investing and
Hedge Fund Performance Measurement Bias”, 2002, Journal of Portfolio
Management.
• Lo, Andrew W.; “Risk Management For Hedge Funds: Introduction and
Overview”, White Paper, June, 2001.
• Ljung, G.M.; Box, G.E.P.; “On a measure of lack of fit in time series models”,
Biometrika, 65, 2, pp. 297-303. 1978.
• Chan, Nicholas; Getmansky, Mila; Haas, Shane M.; Lo, Andrew; “Systemic Risk
and Hedge Funds”, 2005, NBER Draft, August 1, 2005.
Retrieved from "http://en.wikipedia.org/wiki/Bias_ratio_(finance)"
Categories: Financial ratios
Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form
Special:Search Go Search
Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Roll's critique
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Roll's critique is a famous analysis of the validity of the Capital Asset Pricing Model (CAPM).
It concerns methods to formally test the statement of the CAPM, the equation
This equation relates the asset expected return E(Ri) to the asset covariance βim with the market
portfolio return Rm. The market return is defined as the wealth-weighted sum of all investment
returns in the economy.
Roll's critique makes two statements regarding the market portfolio:
1. Mean-Variance Tautology: Any mean-variance efficient portfolio Rp satisfies the CAPM
equation exactly:
.
Mean-variance efficiency of the market portfolio is equivalent to the CAPM equation holding.
This statement is a mathematical fact, requiring no model assumptions.
Given a proxy for the market portfolio, testing the CAPM equation is equivalent to testing mean-
variance efficiency of the portfolio. The CAPM is tautological if the market is assumed to be
mean-variance efficient. Proof of Mean Variance Tautology.
2. The Market Portfolio is Unobservable: The market portfolio in practice would necessarily
include every single possible available asset, including real estate, precious metals, stamp
collections, jewelry, and anything with any worth. The returns on all possible investments
opportunities are unobservable.
From statement 1, validity of the CAPM is equivalent to the market being mean-variance
efficient with respect to all investment opportunities. Without observing all investment
opportunities, it is not possible to test whether this portfolio, or indeed any portfolio, is mean-
variance efficient. Consequently, it is not possible to test the CAPM.
[edit]Relationship to the APT
The mean-variance tautology argument applies to the Arbitrage Pricing Theory and all asset-
pricing models of the form
where are unspecified factors. If the factors are returns on a mean-variance portfolio,
the equation holds exactly.
It is always possible to identify in-sample mean-variance efficient portfolios within a dataset of
returns. Consequently, it is also always possible to construct in-sample asset pricing models that
exactly satisfy the above pricing equation. This is an example of data dredging.
[edit]Discussion
Roll's critique has received a large number of citations in the financial economics literature[1].
The majority of these citations refer to the second statement of critique; few papers address the
first statement. Many researchers and practitioners interpret Roll's critique as stating only "The
Market Portfolio is Unobservable."