Вы находитесь на странице: 1из 51

VARIANCE REDUCTION

(L&K Chapter 11)


Variance reduction techniques (VRTs) improve a crude simulation
experiment by

reducing the variance of the estimators without increasing the


computational effort, or by

obtaining the same variance with less computational effort.

VRTs focus on point-estimator performance, but this improvement


should be reflected in a reduced measure of error.
VRTs do not necessarily change the underlying variance of the system.

147

VRTs may require additional computational or analyst effort, so you


must decide if there is a net improvement.
We present general VRTs, but VRTs work best when tailored to a
specific problem.

148

WHY DO VARIANCE REDUCTION?


Recall the Challenger accident in 1986.
After that, a friend of mine was hired by Morton Thiokol to look at
reliability estimation. MT wanted chances of 106 of failure.
At 100 random numbers per rep, how many do we need to observe
30 failures? 1,000,000 x 100 x 30 = 3 billion
Suppose that the failure probability is 1 105 (meaning it is an
order of magnitude too large).
If we make 30 million replications, the standard error of the estimate
of Pr{failure} is approximately 2 105 . This is twice as large as
the value we are estimating!
Without variance reduction a lot of simulation still gives a poor
estimate.
149

APPLICATIONS OF VARIANCE REDUCTION


Highly reliable systems
Financial engineering/quantitative finance
Simulation optimization
Lots of alternatives
Noisy gradient estimates
Metamodeling/mapping
Real-time control using simulation

150

VARIANCE REDUCTION IN REAL LIFE


Probably the most common example of variance reduction is public
opinion polling (very popular in the US).
One reason why small samples like 1500 give such good estimates
is that the sampling is not purely random.
Stratification among income levels, race, political parties insure a
more representative sample

151

VARIANCE REDUCTION TECHNIQUES


We will cover the following VRTs for estimating = E[Y ], using the
highly reliable system simulation as an illustration.

Antithetic Variates (AV) Applicable in all stochastic simulations. Attempts to


balance simulation outputs by balancing the pseudorandom numbers.
Control variates (CV) Applicable in all stochastic simulations. Exploits information about simulation inputs to reduce variance of simulation outputs.
Based on least-squares regression.
Conditional Expectations (CE) Not generally applicable, but guaranteed effective when it is. Basic idea is not to simulate what you can compute.
Importance Sampling (IS) Generally applicable. Attempts to bias the simulation outputs toward more important areas.

We have already covered CRN; L&K add Indirect Estimation.

152

ABSOLUTE VS. RELATIVE PARAMETER


In theory any VRT could be used for any problem, but it is useful
to distinguish two classes:

Estimating the value of a parameter associated with a single


system (absolute parameter).
In this case the actual value of the parameter matters.

Estimating the difference between parameters of two or more


distinct systems (relative parameter).
In this case only the relative difference matters and not the actual
values of the parameters. Common random numberswhich we
have already coveredis the primary VRT here.

153

BACKGROUND

Cov[A, B] = E[(A A)(B B )]


q

Corr[A, B] = Cov[A, B]/ Var[A]Var[B]

Let Z = Y bX. Then


Var[Z] = Var[Y ] + b2 Var[X] 2bCov[Y, X]

E[T ] = E [E[T |S]]

Some VRTs exploit these relationships.

154

MARKOV-PROCESS PRIMER
A continuous-time Markov process is a stochastic process {Yt; t 0}
with state-space a subset of {0, 1, 2, . . .}.
Examples include queueing, reliability, inventory, combat, biological
processes.
Characteristics:
Time spent in each state is exponentially distributed.
Next state entered depends only on the current state.
MPs can be analyzed via mathematical analysis, but when the state
space is large numerical analysis or simulation may be required.

155

Suppose the state space is {1, 2, . . . , m}. The generator of the MP


describes how Yt evolves over time.

g11 g12 g1m


g21 g22 g2m
..
..
..
..
gm1 gm2 gmm

For i 6= j, 1/gij is the mean of the exponential time until the process
moves from state i to j.
gij can be interpreted as the transition rate from i to j.
For i = j, 1/gii = 1/
time in state i.

P
j6=i gij is the mean of the exponential holding

gii can be interpreted as the transition rate out of state i.

156

TTF EXAMPLE
State space {0, 1, 2} corresponds to number of functional computers.
B is the time until computer breakdown, exponentially distributed
with mean 1/, or failure rate .
R is the time to repair a computer, exponentially distributed with
mean 1/, or repair rate .
Generator for {Yt; t 0} is

= ( + )

Estimate = E[TTF] when = 1 per day, = 1000 per day, from


n = 1000 replications.

157

STATE-CHANGE PROCESS
Let {Xn; n = 0, 1, 2, . . .} be the state-change process, where n counts
the number of state changes without regard to time.
The probability of transition from i to j is pij = gij /gii, and from i
to i is pii = 0, provided gii > 0. If gii = 0 then pii = 1.
Transition matrix for TTF example is

= /( + ) 0 /( + )

= 1/1001

1000/1001
1
0

158

CODE FOR TTF


Public Sub TTF(Lambda As Double, Mu As Double, Sum As Double)
sub to generate one replication of the ttf for the
airline reservation system
variables

State = number of operational computers

Lambda = computer failure rate

Mu = computer repair rate

Sum = generate ttf value that is returned from the Call


Dim State As Integer
Dim Fail As Double
Dim Repair As Double
State = 2
Sum = 0
While State > 0
If State = 2 Then
Fail = Exponential(1 / Lambda)
Sum = Sum + Fail
State = 1
Else
Fail = Exponential(1 / Lambda)
Repair = Exponential(1 / Mu)
If Repair < Fail Then
Sum = Sum + Repair
State = 2
Else
Sum = Sum + Fail
State = 0
End If
End If
Wend
End Sub
159

CRUDE EXPERIMENT
Variance is reduced relative to a crude experiment.
Example: Estimate = E[Y ] (expected TTF)
] = 2 /n where
If Y1, . . . , Yn are i.i.d. then Var[Y
Y
Y2 = Var[Y ] = E[(Y )2 ]
VRTs try to do better than this.
number of independent replications: n

point estimator: Y

interval estimator: Y t1/2,n1S/ n

160

INVERSE CDF REVIEW


To generate X F we can set
X = F 1 (U )
with U U (0, 1).
If we want X1 F1 and X2 F2 but can otherwise choose the joint
distribution then...
the minimum Cov[X1 , X2] (and correlation) occurs when
X1 = F11 (U ) and X2 = F21(1 U )
and the maximum Cov[X1 , X2 ] (and correlation) occurs when
X1 = F11 (U ) and X2 = F21(U )

161

ANTITHETIC VARIATES
AV applies to estimating an absolute parameter.
The idea is to balance bad system performance with good system
performance to obtain a better estimate of mean performance.
This is accomplished by balancing the pseudorandom numbers across
replications.
Rather than balance across all n replications, we typically balance
across pairs of replications.
We hope to end up with negatively correlated (antithetic) pairs of
outputs.

162

INDUCING NEGATIVE CORRELATION


On replication 2i 1 use U1, U2, . . .
On replication 2i use 1 U1, 1 U2, . . .
Use different random numbers on replications
(2i 1, 2i) and (2j 1, 2j) for i 6= j (implies independent).
j = (Y2j1 + Y2j )/2, j = 1, 2, . . . n/2 and
Let Y
n/2

1 X
=
j
Y
Y
n/2 j=1

Y2
] =
(1 + )
Var[Y
n
where = Corr[Y2j1, Y2j ].
163

AV EXPERIMENT
number of independent replications: n/2

point estimator Y
q
t1/2,n/21S
2/n
interval estimator: Y
where
n/2

2
X 
1
2
=

j Y
S
Y
n/2 1 j=1

It is possible to induce an antithetic effect among k-tuples rather


than pairs.
The generation schemes tend to be complicated and there is no
general optimal transformation for k > 2.

164

IMPLEMENTATION
In replications 2i 1 and 2i we want U and 1 U to be used for the
same purpose. Assigning a distinct stream to each input process
helps.
The random variate generators in many simulation languages have
calling sequences like
NORMAL(mean, stddev, stream)
Supposedly if you use -stream you obtain the antithetic variates.
However, if the generator is not inverse cdf then there may be no
antithetic effect.

165

CODE FOR AV
To keep the random numbers sychronized, we run the pairs together
until the last one quits.
Public Sub TTFAV(Lambda As Double, Mu As Double, _
Sum As Double, AVSum As Double)
sub to generate one pair of antithetic replications of the ttf for the
airline reservation system
variables

State = number of operational computers

Lambda = computer failure rate

Mu = computer repair rate

Sum = generate ttf value that is returned from the Call


Dim State As Integer
Dim AVState As Integer
Dim Fail As Double
Dim Repair As Double
Dim U1 As Double
Dim U2 As Double
State = 2
AVState = 2
Sum = 0
AVSum = 0

166

While State > 0 Or AVState > 0

simulate until the


pair of runs is complete
U1 = MRG32k3a() random number for failure
U2 = MRG32k3a() random number for repair
If State > 0 Then
Fail = -VBA.Log(1 - U1) / Lambda
Repair = -VBA.Log(1 - U2) / Mu
If State = 2 Then
Sum = Sum + Fail
State = 1
Else
If Repair < Fail Then
Sum = Sum + Repair
State = 2
Else
Sum = Sum + Fail
State = 0
End If
End If
End If

167

antithetic run
If AVState > 0 Then
Fail = -VBA.Log(U1) / Lambda
Repair = -VBA.Log(U2) / Mu
If AVState = 2 Then
AVSum = AVSum + Fail
AVState = 1
Else
If Repair < Fail Then
AVSum = AVSum + Repair
AVState = 2
Else
AVSum = AVSum + Fail
AVState = 0
End If
End If
End If
Wend
End Sub

168

EFFECTIVENESS OF AV
] < Var[Y
]
If < 0 then Var[Y
If < (, n) then
q

E[t1/2,n/21S 2/n] < E[t1/2,n1S/ n]


The function (, n) approaches 0 as n increases; for n 100,
(, n) 0 for = 0.01, 0.05, 0.1.

169

CONDITIONAL EXPECTED VALUE


Suppose S and T have joint distribution
T
S

1
2

1
2/10
1/20
5/20

2
1/10
8/20
10/20

3
1/10
3/20
5/20

4/10
12/20

Then
E[T ] = (1)5/20 + (2)10/20 + (3)5/20 = 2
The conditional distribution of T given S is
T
a
Pr{T = a|S = 1}
Pr{T = a|S = 2}

1
2/4
1/12

2
1/4
8/12

3
1/4
3/12

1.00
1.00
170

Therefore,
E[T |S = 1] = (1)2/4 + (2)1/4 + (3)1/4 = 7/4
E[T |S = 2] = (1)1/12 + (2)8/12 + (3)3/12
= 26/12

A fundamental result from mathematical statistics is that


h

E[T ] = ES ET |S [T |S]
In our case
h

ES ET |S [T |S]

= E[T |S = 1] Pr{S = 1}
+ E[T |S = 2] Pr{S = 2}
= (7/4)(4/10)
+ (26/12)(12/20) = 2

For variance reduction, these results imply that we can use E[T |S]
to estimate E[T ].

CONTROL VARIATES
In CV we approximate E[Y |C] as
E[Y |C] 0 + 1(C C )

(1)

where C = E[C]. Therefore

= E[Y ] = E [E[Y |C]] = 0


If we observe (Yi, Ci C ), i = 1, 2, . . . , n, then we can estimate 0
(and thus ) via a least-squares regression.
If Y and C are strongly correlated, then this estimator will have
.
smaller variance than Y

171

The least-squares estimator of = 0 is

b1(C
C )
b0 = Y
where
Pn
)(Ci C)

(Yi Y
i=1
b
1 =
.
Pn
2

(C

C)
i
i=1

?
How does b0 compare to Y

172

Result: If the linear model is correct, then E[b1] = 1.


Proof: Let C0 = (C1 , C2 , . . . , Cn). Then E[b1|C = c] =
=

P
P

)|C = c](ci
E[(Yi Y
c)
P
(ci
c)2

(0 + 1(ci C ) 0 1(
c C ))(ci
c)
P
(ci
c)2
= 1 .
=

By the double expectation theorem, E[b1] = 1.


Result: If the linear model is correct then E[b0] = 0.
Proof:
|C = c] E[b1|C = c](
E[b0|C = c] = E[Y
c C ) = 0 + 1(
c C )
c C ) = 0. Then the double expectation theorem gives the
1(
result.

173

Notice that
h
i
h
i
b
b
b
Var[0] = Var E[0|C] + E Var[0|C]
h
i
= E Var[b0|C]
h
i
since Var E[b0|C] = Var[0] = 0 from the proof of the result above.

Under the special assumption of constant conditional variance (Var[Y |C] =


2), the following result can be derived:
Result: If the linear model is correct and we have constant conditional variance then
!

C )2
1
(
C
2
+P
.
Var[b0|C] =
2

n
(Ci C)

174

If, further, (Y, C) are jointly normally distributed with correlation ,


then 2 = (1 2)Y2 and
"

C )2
(C
1
E P
.
=
2

(Ci C)
n(n 3)
Result: If (Y, C) are bivariate normal, then


Y2
n2
2
b
Var[0] =
(1 ) .
n3
n
Thus, if 2 > 1/(n2) then the control-variate estimator has smaller
variance than the sample mean.

175

The sce and confidence interval for are the usual ones for the
intercept term of a least-squares regression.
Regression set up:

Y1
Y2
..
Yn

1 C1 C "
1 C2 C

..

1 Cn C

0
1

+ = C +

b = (C0C)1 C0Y
b0 t1/2,n2sce
b ] = (Y 0Y 0C0Y )(C0 C)1/(n 2)
d
Var[

176

CV for TTF
Possible control variates:
We know the distribution of the time until a computer breakdown.
C1 = average time until a computer breakdown (E[C1 ] = 1/)
The number of times the process enters state 1 has a geometric
distribution with parameter
/( + )
C2 = number of times process is in state 1
(E[C2 ] = ( + )/)

177

CODE FOR CV
Public Sub TTFCV(Lambda As Double, Mu As Double, Sum As Double, _
CV1 As Double, CV2 As Double)
sub to generate one replication of the ttf for the
airline reservation system and record two control variates
variables

State = number of operational computers

Lambda = computer failure rate

Mu = computer repair rate

Sum = generate ttf value that is returned from the Call

Sum1 = sum associated with control variate number 1,

average computer failure time

Count1 = counter associated with CV1

Count2 = counter for CV2, the number of time process is in state 1

CV# = control variate number #


Dim
Dim
Dim
Dim
Dim

State As Integer
Fail As Double
Repair As Double
Sum1 As Double
Count1 As Double

State = 2
Sum = 0
Sum1 = 0
Count1 = 0
Count2 = 0

178

While State > 0


If State = 2 Then
Fail = Exponential(1 / Lambda)
Sum = Sum + Fail
State = 1
record control variate data
Sum1 = Sum1 + Fail
Count1 = Count1 + 1
Else
Fail = Exponential(1 / Lambda)
Repair = Exponential(1 / Mu)
record control variate data
Sum1 = Sum1 + Fail
Count1 = Count1 + 1
Count2 = Count2 + 1
If Repair
Sum =
State
Else
Sum =
State

< Fail Then


Sum + Repair
= 2
Sum + Fail
= 0

compute control variates


CV1 = Sum1 / Count1 - 1 / Lambda
CV2 = Count2 - (Lambda + Mu) / Lambda
End If
End If
Wend
End Sub
179

Comments about CV:

When the linear relationship does not hold, the CV estimator is


biased. Usually not a problem with large samples, but there are
remedies.

To estimate a probability p = Pr{Y a}, let


Zi = I(Yi a)
Ci = I(Xi b) for X with known probability near p and strongly
correlated with Y .
Then do least-squares regression of Z on
C C .

Control-variate estimators for quantiles are obtained by inverting


the probability estimator above.
180

Multiple control variates can be used (via a multiple regression)


but make sure n is much larger than the number of control
variates.
Regression set up:

Y1
Y2
..
Yn

1 C11 C1 C1q Cq
0

1 C21 C C2q C

1
q
1

+
=
..

..

q
1 Cn1 C1 Cnq Cq

If Y and C are jointly normally distributed


h

Var b0 =

n2
2
]
(1 RY
)Var[Y
C
nq2

Again, the usual sce and confidence interval for 0 apply.


b0 t1/2,nq1sce

CONDITIONAL EXPECTATIONS
CE is useful when E[Y |X] is known for all X, because
E [E[Y |X]] =
and
Var [E[Y |X]] = Var[Y ] E [Var[Y |X]]

Instead of observing Y1, . . . , Yn, we observe


X1, . . . , Xn and let
n
1 X
ce =
E[Y |Xi]
Y
n i=1

CE gives a guaranteed variance reduction.

181

CE EXPERIMENT
number of independent replications: n
ce
point estimator Y

interval estimator: Yce t1/2,n1S/ n


where
n 
2
X
1
2
=
ce
S
E[Y |Xi] Y
n 1 i=1

182

CE for TTF
We can write
TTF = H1 + H2 + + HN
where Hn is the holding time in the nth state entered, and N is the
number of states entered before system failure.
But
E[Hn|Xn = 2] = 1/
E[Hn|Xn = 1] = 1/( + )
Thus we condition on X = (X1 , . . . , XN )
Yce = E[H1|X1 ] + E[H2|X2, X1 ] +
+ E[HN |XN , . . . , X1 ]
= E[H1|X1 ] + E[H2|X2] + + E[HN |XN ]

183

CODE FOR CE
Public Sub TTFCE(Lambda As Double, Mu As Double, Sum As Double)
sub to generate one replication of the ttf for the
airline reservation system using conditional expectations
variables

State = number of operational computers

Lambda = computer failure rate, Mu = computer repair rate

Sum = generate ttf value that is returned from the Call

HoldingTime# = expected holding time in state #


Dim State As Integer
Dim Fail As Double
Dim Repair As Double
Dim HoldingTime1 As Double
Dim HoldingTime2 As Double
HoldingTime1 = 1 / (Lambda + Mu)
HoldingTime2 = 1 / Lambda
State = 2
Sum = 0
While State > 0
If State = 2 Then
Fail = Exponential(1 / Lambda)
Sum = Sum + HoldingTime2
State = 1
Else
Fail = Exponential(1 / Lambda)
Repair = Exponential(1 / Mu)
If Repair < Fail Then
Sum = Sum + HoldingTime1
State = 2
Else
Sum = Sum + HoldingTime1
State = 0
184

End If
End If
Wend
End Sub

IMPORTANCE SAMPLING
Suppose we represent
=

Z
A

g(z)f (z) dz

( could be a probability if g is an indicator function).


Provided f 0(z) > 0 when f (z) > 0, we can rewrite as
Z

f (z ) 0
=
f (z) dz
g(z) 0
f (z )
A
This is now an expectation with respect to f 0.
If g(z)f (z)/f 0 (z) is nearly constant for all z, then we have reduced
variance.
In fact, if f 0 = gf / then the variance is 0!

185

The random variable f (Z)/f 0 (Z) is called the likelihood ratio (LR).
Frequently, Z = (Z1 , Z2, . . . , ZN ) are independent so that
QN
f (Z )
i=1 f (Zi )
LR = 0
= QN
0
f (Z )
i=1 f (Zi)

In dynamic simulation N can become quite large, making this term


unstable.
Selecting f 0 is not easy in general, and a bad selection can increase
variance.

186

IS for TTF
If we change the failure rate to 0 and the repair rate to 0 then
the LR for Yt is
QN B
Bi QNR eRj
e
i=1
j=1
LR = QN
B 0 e0 Bi QNR 0 e0 Rj
i=1
j=1
PNR

j=1 Rj
NB e i=1 Bi NR e
P NB
PNR
0
0
(0)NB e i=1 Bi (0)NR e i=1 Ri
PNB

If we only simulate the state-change process Xn then the LR is


QN
n=1 pXn1,Xn
LR = QN
0
p
n=1 Xn1,Xn

Try changing = 1 to 0 = 1/2.

187

CODE FOR IS
Public Sub TTFIS(Lambda As Double, Mu As Double, _
LambdaPrime As Double, MuPrime As Double, _
Sum As Double, LikelihoodRatio As Double)
sub to generate one replication of the ttf for the
airline reservation system using importance sampling
variables

State = number of operational computers

Lambda = computer failure rate

Mu = computer repair rate

LambdaPrime = biased computer failure rate

MuPrime = biased computer repair rate

Sum = generate ttf value that is returned from the Call

LikelihoodRatio = likelihood ratio


Dim State As Integer
Dim Fail As Double
Dim Repair As Double
initialize likelihood ratio
LikelihoodRatio = 1
State = 2
Sum = 0

188

While State > 0


If State = 2 Then
Fail = Exponential(1 / LambdaPrime)
Sum = Sum + Fail
State = 1
LikelihoodRatio = LikelihoodRatio * Lambda * Exp(-Lambda * Fail)/ _
(LambdaPrime * Exp(-LambdaPrime * Fail))
Else
Fail = Exponential(1 / LambdaPrime)
Repair = Exponential(1 / MuPrime)
If Repair < Fail Then
Sum = Sum + Repair
State = 2
LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda* Fail) _
/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _
* Mu * Exp(-Mu * Repair)/(MuPrime * Exp(-MuPrime * Repair))
Else
Sum = Sum + Fail
State = 0
LikelihoodRatio = LikelihoodRatio*Lambda*Exp(-Lambda * Fail) _
/ (LambdaPrime * Exp(-LambdaPrime * Fail)) _
* Mu * Exp(-Mu*Repair)/(MuPrime * Exp(-MuPrime * Repair))
Sum = Sum * LikelihoodRatio
End If
End If
Wend
End Sub

189

CODE FOR IS+CE


Public Sub TTFIS(Lambda As Double, Mu As Double, _
LambdaPrime As Double, MuPrime As Double, _
Sum As Double, LikelihoodRatio As Double)
sub to generate one replication of the ttf for the
airline reservation system using conditional expectations
and importance sampling
variables

State = number of operational computers

Lambda = computer failure rate

Mu = computer repair rate

LambdaPrime = biased computer failure rate

MuPrime = biased computer repair rate

Sum = generate ttf value that is returned from the Call

HoldingTime# = biased expected holding time in state #

Product = importance sampling accumulator

LikelihoodRatio = likelihood ratio

p12 = correct transition probability from state 1 to 2

is12 = biased transition probability from state 1 to 2


Dim
Dim
Dim
Dim
Dim

State As Integer
Fail As Double
Repair As Double
HoldingTime1 As Double
HoldingTime2 As Double

190

compute expected holding times & transition probabilities


HoldingTime1 = 1 / (Lambda + Mu)
HoldingTime2 = 1 / Lambda
p12 = Mu / (Lambda + Mu)
is12 = MuPrime / (LambdaPrime + MuPrime)
Product = 1
State = 2
Sum = 0
While State > 0
If State = 2 Then
Fail = Exponential(1 / LambdaPrime)
Sum = Sum + HoldingTime2
State = 1
Else
Fail = Exponential(1 / LambdaPrime)
Repair = Exponential(1 / MuPrime)
If Repair < Fail Then
Sum = Sum + HoldingTime1
State = 2
Product = Product * (p12 / is12)
Else
Sum = Sum + HoldingTime1
State = 0
LikelihoodRatio = Product * ((1 - p12) / (1 - is12))
Sum = Sum * LikelihoodRatio
End If
End If
Wend
End Sub

191

HOW DID THEY DO?


AV: a little variance reduction
CE: a little variance reduction
CV: huge variance reduction (97%)
IS: a little variance reduction
IS+CE: modest variance reduction (60%)

192

VRT EXERCISE
Problem: Estimate p = Pr{Y > a} and E[Y ] where
Y = max{X1 + X4 + X6 , X1 + X3 + X5 + X6 , X2 + X5 + X6}
Crude Experiment:
1. sum 0
2. repeat n times:
sample X1 , . . . , X5
Y = max{X1 + X4 + X6 , X1 + X3 + X5 + X6, X2 + X5 + X6 }
sumP = sumP +I(Y > a)
sumM = sumM +Y
3. return p = sumP/n
= sumM/n
Y
193

DETAILS
Let the Xi be i.i.d. exponential(1).
Take a = 6, n = 30.
Try every variance reduction technique we learned on one of the two
problems (or both).
You must demonstrate that at least one of your VRTs works on each
problem. Use the experiment-within-an-experiment approach with
m macroreplications and form a confidence interval on the variance
ratio.
Remember that the practitioner gets only one experiment, but the
researcher gets as many as necessary to establish properties.

194

Вам также может понравиться