Вы находитесь на странице: 1из 48

Variance Reduction

Techniques
Content
1
Antithetic Variables

2 Control Variates

3 Conditioning Sampling

4 Stratified Sampling (optional)

5 Importance Sampling
Introduction
 Recall we estimate the unknown quantityθ= E(X) by
generating random numbers X1,. . . , Xn , and use
to estimateθ.

The mean square error is

Hence, if we can obtain a different unbiased estimate of θ having a


smaller variance than does , we would obtain an improved
estimator θ= E(X)
Antithetic Variables
The Use of Antithetic Variables

 X1 and X2 are random variables generated identically


distributed having mean θ. Then

 If X1 and X2 rather than being independent were negatively


correlated, …
The Use of Antithetic Variables
How do we generate negatively correlated random numbers?

Suppose we simulate U1, U2, …Um, which are uniform random


numbers. Then V1 =1 – U1., . . , Vm = 1 - Um would also be uniform
random numbers.

Therefore, (Ui,Vi,) are negatively correlated.

Actually, it can be proven that If X1 = h(U1,. . . ,Um) and X2 =


h(V1,. . . , Vm,) ( h is a monotone function (either increasing or
decreasing)), then X1 and X2 have the same distribution and are
negatively correlated. (Proof: Appendix 8.10 ,Page 210)
The Use of Antithetic Variables
 How to arrange for X1 and X2 to be negatively correlated?

 Step1:
X1 = h(U1,. . . ,Um)
where U1, …, Um i.i.d. ~ U(0,1), and h is a monotone function
of each of its coordinates.

 Step2:
X2 =h(1 – U1., . . , 1 – Um )
which has the same distribution as X1.
What Does “Antithetic” Mean?

“Antithetic” means:
Opposed to, or
The opposite of, or
Negatively correlated with…

The idea is to determine the value of an output variable at random,

then determine its antithetic value (which comes from the


opposite part of its distribution),

then form the average of these two values, using the average as
a single observation.
Advantage
 The estimator have smaller variance (at least when h is
a monotone function)

 We saved the time of generating a second set of


random numbers.
Example1
Suppose we were interested in using simulation to estimate

 If X1 = eU1, X2 =eU2 ,where U1,U2 iid ~ U (0, 1). We


have

 where
Example1

 h (u)= eu is clearly a monotone function.


 X1 = eU, X2 =e1-U ,where U ~ U (0, 1)
 Cov (X1 ,X2 ) =

 Var ((X1 +X2)/2) =

 The variance reduction is 96.7 percent.


Example1: Matlab code
 n=1000;
 u=rand(n,1);
 x=exp([u;1-u]); %Antithetic Variable
 theta=sum(x)/(2*n) % esitmator using Antithetic Variable
 u0=rand(n,1);
 x0=exp([u;u0]); %independent variable
 theta0=sum(x0)/(2*n)
Example1: Matlab code
 n=1000; m=1000;
 u=rand(n,m);
 x=exp([u;1-u]); %Antithetic Variable
 theta=sum(x)/(2*n);
 true=exp(1)-1; %the true value is e-1=1.7183
 mseav=sum((theta-true).^2)/m % mean square error
u0=rand(n,m);
 x0=exp([u;u0]);
 theta0=sum(x0)/(2*n); %independent variable
 mse0=sum((theta0-true).^2)/m % mean square error
 reduction=1-mseav/mse0
Example 2
Estimate the value of the definite integral

Solution: Firstly, we can generate values from the probability


density function f(x)=e−x..
This is done by setting Xi=−lnUi, where Ui, i = 1…n is random
number with Ui∼U(0 1).

An antithetic variable is
[−ln(1−Ui)]0.9
so an unbiased combined
n
estimator is
1
n
 (1/2){[−ln(U
i 1
i )]0.9+[−ln(1−U )]0.9}
i
Additional Example
Consider the case where we want to estimate E(X2) where
X ~Normal(2, 1). How to use antithetic variables to estimate
and improve its variance.
n.sim=5000; %set the numbers to simulate
out1=(2+randn(1,n.sim)).^2;%generate a Normal random variable (2,1) and square it
mean(out1)
var(out1)

out2_1=2+randn(1,n.sim/2); % now use antithetic variables


out2_2=4-out2_1;
out2=0.5*(out2_1.^2 + out2_2.^2);
mean(out2)
var(out2)

%how much variance is reduced?


Var.reduction=(var(out1)/n.sim -var(out2)/(n.sim/2))/(var(out1)/n.sim);
Var.reduction
Control Variates
The Use of Control Variates
• Assume desired simulation quantity is θ = E[X];
there is another simulation R.V. Y with known μy = E[Y ].

• Then for any given constant c, the quantity


X + c (Y - μy)
is also an unbiased estimator of θ

• Consider its variance:

• It can be shown that this variance is minimized when c is


equal to
The Use of Control Variates
• The variance of the new estimator is:

• Y is called the control variate for the simulation estimator X.

• We can re-express this by dividing both sides by Var(X):

where

the correlation between X and Y.

The variance is therefore reduced by 100[Corr(X,Y)]2 percent.


The Use of Control Variates

Note: goal is to choose Y so that Y is ≈ X, with Y is easy


to simulate and μY is easy to find.

The controlled estimator


=

and its variance is given by


Estimation
 If Cov(X,Y) and Var(Y) are unknown in advance.
 Use the estimators

and

 The approximation of c*
Several Variables as a Control

 We can use more than a single variable as a control. For


example if a simulation results in output variables Yi, i=1,
… , k, and E[Yi]=μi is known, then for any constants ci,
i=1, …, k, we may use

as an unbiased estimator of E[X].


Example 3 Estimate   E[eU ] 1 e x dx.
0
 A natural control variate is random number U
 X = eU, Y =U ,where U ~ U (0, 1)
 Cov (X ,Y )

 Var(Y ) = Var(U)=1/12, then

= -12*0.14086= -1.6903

Thus, the controlled estimator is:


n
1
ˆ   (eui  1.6903(ui  0.5))
n i 1
Example 3
 Var (X + c*(Y-μy ))

 From Example 1, Var(eU)=0.2420


 The variance reduction is 98.4 percent
1- 0.0039/0.2420 = 98.4%
Example 3: Matlab code
 n=1000;
 m=1000;
 y=rand(n,m); %control variate
 x=exp(y);
 c=-1.6903;
 z=x+c*(y-0.5); % X + c (Y - μy)
 theta=sum(z)/(n);
 true=exp(1)-1;
 msecv=sum((theta-true).^2)/m; % mean square error
 theta0=sum(x)/(n);
 mse0=sum((theta0-true).^2)/m;
 reduction=1-msecv/mse0
Example 4:
Suppose we wanted to estimate , where

a) Explain how control variables may be used to estimate θ.

b) Do 100 simulation runs, using the control given in (a), to


estimate first c∗ and then the variance of the estimator.

c) Explain how to use antithetic variables to estimateθ. Using the


same data as in (b), determine the variance of the antithetic
variable estimator.

d) Which of the two types of variance reduction techniques worked


better in this example?
Example 4:

a) Let X  eU , so that   E(eU ) .One possible choice of control


2 2

variable is Y = U2. The expected value of Y is

So we can use the unbiased estimator of θ given by


n
ˆ   (e
i 1
U i2
 c  (U i2  1 / 3)) / n
where
Example 4:
b) The following Matlab program can be used to answer the question
m = 100; U = rand(1,m); Y = U.^2; Ybar = 1/3; X = exp(Y);
Xbar = sum(X)/m;
A = sum((X-Xbar).*(Y-Ybar));
B = sum((Y-Ybar).^2);
C = sum((X-Xbar).^2);
CovXY = A/(m-1);
VarY = B/(m-1);
VarX = C/(m-1);
c = -A/B;
%Estimator:
Xc = X + c*(Y-Ybar);
Xcbar = sum(Xc)/m;
%Variance of estimator:
VarXc = (VarX - CovXY^2/VarY)/m
One run of the above program gave the following: the estimated value of c∗ was −1.5950,
and the variance of the estimator Xc was Var(Xc ) = 4.5860 × 10−5.
Example 4:
c) The antithetic variable estimator can be:

Matlab code:
%Antithetic estimator
Xa = (exp(U.^2)+exp((1-U).^2))/2;
Xabar = sum(Xa)/m;
VarXa = var(Xa)/m
The variance of the antithetic variable estimator (using the same
U) was: Var(Xa) = 2.7120 × 10−4.

d) It is clear from part (c) that it is better to use the control variable
method.
Conditioning sampling
Variance Reduction by Conditioning
Review: Conditional Expectation:
E[X|Y] denotes that function of the random variable Y whose
value at Y=y is E[X|Y=y].

 If X and Y are jointly discrete random variables,

 If X and Y are jointly continuous with joint p.d.f. f(x,y),


Variance reduction by conditioning
• Recall the law of conditional expectations: (textbook Page 34)

This implies that the estimator E(X|Y) is also an unbiased


estimator.

• Now, recall the conditional variance formula: (textbook Page 34)

Clearly, both terms on the right are non-negative, so that we have

This implies that the estimator, by conditioning, produces a


more superior variance.
Variance Reduction by Conditioning
 Procedure:

 Step1: Generate r.v. Y=yi

 Step2: Compute the (conditional) expected value


of X given Y : E [X | yi].

 Step3: An unbiased estimate of θ is Σni=1E[X|yi]/n


Example 5: Estimate π
 To estimate π

Recall the simulation introduced in Chapter 1


 Vi = 2Ui – 1, i =1,2 where Ui ~ U(0,1) 1

 Set

-1 0 1

 E[I] = π/4.
-1
 Use E[I|V1] rather than I to estimate π/4.
Example 5: Estimate π

 Hence, E[I|V1] = (1 – V12)1/2


 Use (1 – V12)1/2 as the estimator
Example 5: Estimate π
 The variance

 I is a Bernouli r.v. having mean π/4, then

 The conditioning results in a 70.44 percent reduction in


variance.
Example 5: Estimate π
Procedure 2:
Step1: Generate V i= -2Ui-1, i=1…n, where Ui i.i.d. ~ U(0, 1).

Step 2: Evaluate each (1Vi2 )1/ 2 and take the average of all these
values to estimate π/4.

Matlab code:
n=1000;
m=1000;
u1=rand(n,m);
v1=2*u1-1;
v=(1-v1.^2).^0.5;
theta=4*sum(v)/n;
msecv=sum((theta-pi).^2)/m; % reduction in variance
reduction=1-msecv/mse0
Example 5: Estimate π
Matlab program for comparison of two simulation procedure:
 n=1000;
 m=1000;
 u1=rand(n,m);
 v1=2*u1-1;
 % ------------raw simulation-------------------------
 v2=2*rand(n,m)-1;
 s=v1.^2+v2.^2<=1;
 theta0=4*sum(s)/(n);
 mse0=sum((theta0-pi).^2)/m;
 % ----------conditioning sampling-------------------------
 v=(1-v1.^2).^0.5;
 theta=4*sum(v)/n;
 msecv=sum((theta-pi).^2)/m; % reduction in variance
 reduction=1-msecv/mse0
Example 6:

 Suppose that Y ~ Exp (1)


 Suppose that, conditional on Y= y, X ~ N (y, 4)
 How to estimateθ= P{X>1}?

 Raw simulation:
 Step1: generate Y = - log(U), where U ~ Uni(0, 1)
 Step2: if Y= y, generate X ~ N (y, 4)
 Step3: set

then E[I]=θ
Example 6:
Can we express the exact value of E(I | Y=y) in terms of y?
Improvement:
 If Y= y, is a standard normal r.v..

 Then,

where (x)1(x) .

1 y
 Therefore, the average value of ( ) obtained over many runs
2
is superior to the raw simulation estimator.
Example 6:
Procedure 2:
Step1: Generate Y i= - ln(Ui), i=1…n, where Ui i.i.d. ~ U(0, 1).

Step 2: Evaluate each (12Yi ) and take the average of all these
values to estimate θ.

Matlab code:
n=1000;
EIy=zeros(1,n);
for i=1:n
y=exprnd(1);
EIy(i)=1-normcdf((1-y)/2);
end
theta=mean(EIy)
Example 6:
Can we use antithetic variables to improve simulation?

Because the conditional expectation estimator (12y )


is monotone in Y, the simulation can be improved by
using antithetic variables.

Further Improvement:
 Using antithetic variables
Example 6:

Procedure 3:

Step 1: Generate U1, U2, …Um, 1-U1, 1-U2, …,1-Um .


1 1log u 1log(1u )
Step 2: Evaluate (( i )  ( i )),i 1...m
2 2 2
and average all these values to estimate θ.
Example 7:

 The random variable

is said to be a compound random variable if N is a


nonnegative integer valued random variable and X1, X2, …
be a sequence of i.i.d. positive r.v.s that are independent of
N.
Example 7:

 In an insurance application, Xi could represent the amount of


the ith claim made to an insurance company, and N could
represent the number of claims made by some specified time t ;
S would be the total claim amount made by time t .

 In such application, N is often assumed be a Poisson random


variable (in which case S is called a compound Poisson
random variable).
Example 7:
 Suppose that we want to use simulation to estimate

 Raw simulation
 Step1: generate N, say N = n
 Step2: generate the values of X1, …, Xn
 Step3: set

then E[I] = p
Example 7:
Improvement by conditioning
 Introduce a random variable M

 What is E[I | M=m]?


 We can prove:

N and M are independent

Thus, if given M=m, the value E[I | M=m] obtained is P{N≥m}.


Since the distribution of N is known (Specially, Poisson distribution), the
probability P{N≥m} is easy to be found.
Example 7:

Procedure of simulation improved by conditioning

m
 Step1: Generate Xi in sequence, stopping when S   X i  c .
i 1

 Step2: Calculate P{N≥m} as the estimate of p from this run.


Example 7:
 Suppose that N is a Poisson r.v. with rate of 10 per day, the
amount of a claim X is an exponential r.v. with mean $1000,
and that c=$325000. Simulate the probability that the total
amount claimed within 30 days exceeds c.

Code:
n=100;c=325000;I=zeros(1,n);
for i=1:n
s=0;m=0;
while s<c
x=exprnd(1000); % exponential with mean 1000
s=s+x;
m=m+1;
end
p=1-poisscdf(m-1,300); % poisson with rate of 10 per day
I(i)=p;
end
p_bar=sum(I)/n

Вам также может понравиться