Академический Документы
Профессиональный Документы
Культура Документы
Utrecht University
Department of Mathematics
November 14, 2011
Contents
1 Introduction
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3
3
4
4
5
5
7
7
10
11
12
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
14
14
15
16
16
17
18
19
References
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
Introduction
Random numbers are the basis of various computational problems, amongst which
is Monte Carlo integration. What are random numbers and how do we obtain
them? These questions have a philosophical nature, because true randomness is
rare. Rather, scientists use the notion of pseudo-random numbers, which are also
the topic of research in the first part of this report. It is not possible for a computer
to produce random numbers, because a computer is deterministic. However, there
exist many algorithms for generating pseudo-random numbers and these appear to
be quite good. What is good in this sense? In [2] a practical approach is taken: a
random number generator (RNG) is good when it is effective. An RNG is in turn
effective when it produces statiscally the same results in an application as a true
RNG. This can be determined by theory, or by comparing the RNG to a true RNG,
one that is based on a truly random process.
This report will firstly introduce a class of RNGs called linear congruential
random number generators. Some of these are good and some are efficient in terms
of computation time. The quality demands on RNGS are discussed and a set of
linear congruential generators will be subjected to tests on their randomness. In
the second part of this report, we will use such RNGs to perform Monte Carlo
integration. This technique is based on the assumption that the area under a graph
is proportional to the chance that a random point will be inside this area. Monte
Carlo integration will be compared to other numerical methods and the analytical
integration.
As mentioned in the introduction, although computers are completely deterministic, they are capable of producing sequences of numbers that appear random. As
the name gives away, linear congruential generators are sequential RNGs. They
produce sequences of random numbers in the sense that the next random number
is completely determined by the previous one. Linear congruential generators are
based on the following two formulas:
f (x) = (ax + c) mod m
(1)
xi+1 = f (xi )
(2)
Here, x is the random number, a is called the multiplier, c the increment, and m
is the modulus. An RNG of this type can only produce at most m different values,
because of the modulo operation. Therefore, m is also referred to as the maximum.
Another important property of an RNG is the period. After producing a certain
amount n of different numbers, the RNG returns the value of x0 again. This will
then lead to the same sequence x0 to xn1 . The period is bounded by the maximum
m of the generator. A last notion is the seed of the generator, this is the starting
value x0 , which must in some way be provided to the RNG.
2.1
The task is now to find suitable values for a, b and m. The modulo m should
in general be quite large, for two reasons. First, we want the period to be large,
because repeated sequence of numbers are not random. As the period is bounded by
m, m should be large. Secondly, we want the random numbers to be able to take
many different values, because in many cases we want to approach a continuous
random variable. Computation time, however, puts a constraint on how large m
and the other parameters can be. If m is 232 1, then it can be fit in a 32 bit
integer. The part ax + c can take much higher values than 232 and as a result, the
complete computation cannot be performed in 32 bit arithmetic.
This problem can be solved in some cases by using Schrages algorithm. This algorithm provides a way for performing the operation ax mod m without exceeding
m.
x
ax mod m = a(x mod q) r" #
(3)
q
If this is positive, if not m should be added. This holds only for r > q, where r and
q are given by:
m
q=" #
a
r = m mod a
Another trick that is used is called controlled overflow. If m is 232 , then in 32
bit arithmetic the modulo operation occurs automatically, because the computer
deletes the most significant bits such that ax+c fits into 32 bits. Because controlled
overflow is much faster than a modulo operation, a popular choice for m used to be
232 .
2.2
in the previous section. The Park-Miller generator has been suggested by [4] as
a minimal standard as to which the randomness of other generators should be
compared, because of its good randomness. It is also mentioned that Park-Miller
has full period, which is a good property. The Noname and Bad generators have
uncommon choices for their parameters, but will serve as interesting subjects of our
tests. The GSL generator is taken from the rand() functions of the GNU Scientific
Library, and has been used as the standard generator on Unix for many years. As
we will show, GSL has correlations in the lower bits of its random numbers, which
is bad for randomness. Sun circumvented this problem by cutting off the lowest
16 bits of the GSL random numbers. As a result, Sun and GSL have the same
parameters, but deliver different random numbers. Lastly, the Standard generator
uses the systems built in rand() function. It has functions to set the seed, generate
the next random number, and obtain the maximum, but does not have functions
to obtain the parameters a and c.
Schrages trick cannot be applied to the GSL, Sun and Quick generators because
they have c $= 0. Of the others, only Park-Miller and Bad have r < q. The bad
generator has m = 75 and as m2 < 232 , there is no need for Schrages trick because
the calculation can be performed in 32 bits. Consequently, the Park-Miller generator
is the only generator for which I applied Schrages algorithm. It has q = 127773
and r = 2836.
2.3
Random numbers have different applications, which need random numbers in different ranges. We do not want to have a different RNG for every application. Rather,
we would like an RNG that produces random numbers that can be transformed to
any range. Therefore, it is easiest to have our RNG generate random numbers with
a uniform distribution on [0, 1]. This means that we want a continuous random
variable with a probability density function (pdf), f (x) that is 1 for 0 x 1 and
0 otherwise. We will see two ways in which this distribution can be transformed to
any other required distribution.
The values a, c and m of linear congruential generators should be chosen in such
a way that the random numbers between 0 and m will be distributed evenly. This
is for example the case when the RNG has full period, a period of m. Such an RNG
will return all numbers between 0 and m once when generating m random numbers.
If we divide numbers of such an RNG by m, we will get values between 0 and 1,
which are evenly distributed. In this way, the RNG generates random numbers that
appear to be from a uniform distribution.
We will next discuss two techniques of generating random numbers from a nonuniform distribution. These are the inversion method and the rejection method.
2.3.1
The inversion method is based on a property of the inverse of the cumulative density
function (cdf) of a random variable. The inverse F 1 of a cdf F is given by:
F 1 (q) = inf{u : F (u) q}
4
(4)
The values of a cdf range from 0 to 1, because firstly there exists no such thing
as negative probability. The second reason is, because the probability of a random
variable to take any value is equal to 1. Subsequently 0 < q < 1. This definition of
the inverse is explained in words as follows: it is the function that gives for a value
of q the lowest value of u such that the cdf at u takes equal to, or higher than q.
For cdfs that are strictly monotone increasing and continuous, this definition of the
inverse just gives the function F 1 (q) that gives the value u such that F (u) = q.
The inversion method is based on the following theorem, which is given, but not
proven, in [1], [2] and [3].
Theorem 1 Suppose U is a random variables which is uniformly distributed on
[0, 1]. Then the random variable X given by X = F 1 (U) has the cumulative
distribution function F , where F 1 is the inverse of F .
Consequently, when the inverse of the cdf of the required distribution is known,
the RNG can generate numbers from this distribution by simply calculating F 1 (xi )
for each random number xi from the uniform distribution. However, the inverse is
not always available, in which case the rejection method can work.
2.3.2
(6)
Which is in fact the same rule used as in the example, with g(x) = 1 and c = 2
plugged in.
2.3.3
Suppose we would like to generate random numbers from the non-uniform distribution f (x) = 3x2 on [a, b]. The easiest method is to take g(x) to be the uniform
5
1
distribution on [a, b], so g(x) = ba
. We have to choose c such that the product
cg is higher or equal to the maximum of f . The maximum is either f (a) or f (b).
Then, c = max{f (x)|x [a, b]}(b a) and cg = max{f (x)|x [a, b]}.
Now generate two random numbers x1 and y1 from a uniform distribution on
[0, 1]. We need x1 to be from the distribution g, which is easily done by rescaling x1
as such: x1 := a + (b a)x1 ;. Then Equation (6) is simplified to x1 being accepted
if:
f (x1 ) y1 max{f (x)|x [a, b]}
I have used the Park-Miller generator to obtain 10 000 000 random numbers from
f (x) = 3x2 and counted the number of xi s that were not accepted, the number of
rejections. For a = 2 and b = 1.5 the number of rejections was 26 923 245. In this
case, we generated 3.7 times more points than were needed. On the interval a = 0
and b = 3, this number is smaller; 20 004 146 rejections were counted.
In order to bring down the number of rejections, we can pick g more carefully. A
closer distribution is the line going through a and b. We write g(x) = dx + e, where
(a)
the slope of the line is given by d = f (b)f
, and the intersection by e = f (a) d a.
ba
Now, we require Theorem 1 to generate numbers from g. The problem is that
g is not normalized, which is important for applying Theorem 1. Let G be the cdf
of g, and G1 its inverse. The maximum of G is not 1 anymore, but some value k.
As a result, the domain of G1 is not [0, 1] anymore, but [0, k]. Therefore, when
using Theorem 1, we should not take G1 (x) where x is uniform, but G1 (kx). This
scaling constant k is in fact the value of the integral of g over [a, b].
The cdf G(x) of g(x) is found by integrating:
! x
! x
g(x)dx =
g(x)dx
G(x) =
!
1
(dx + e)dx = [ dx2 + ex]xa
2
a
1
1
= dx2 + ex da2 + ea
2
2
As mentioned, we also need the integral of g over [a, b] for the scaling constant k.
It is easy to find now:
! b
1
1
g(x)dx = db2 + eb da2 + ea
k=
2
2
a
=
As G(x) is continuous and strictly increasing, the inverse is just given by the
quadratic formula:
"
e2 2d 12 da2 ea x
G1 (x) =
d
For Equation (6), we require xi to be of distribution g. This is done by taking xi
to be from the uniform distribution on [0, 1], and then transforming it by xi :=
G1 (kxi ). The criterium for xi to be selected now becomes
f (xi ) g(xi )yi
6
Where yi is uniform on [0, 1]. We see that c = 1, because we did not normalize g,
with the result that g(x) f (x) for all x [a, b].
I applied this procedure to f (x) = 3x2 on [2, 1.5] to obtain 10 000 000 random
numbers using Park-Miller. The number of rejections was 18 852 090, a reduction
of more than 8 million. On the interval [0, 3], there were only as little as 5 000 585
rejections. With the help of Figure 1 below, the difference in rejections is easily
explained. The area under the red line represents all the random points xi , yi that
were generated. The white part are the rejected xi s, and the blue part are the
accepted ones.
Figure 1. In blue is the histogram of the accepted xi s of the rejection method for the
distribution f (x) = 3x2 . The red line is the height of the histogram of all generated
xi s
2.4
Random Walk
Because the direction in which the point moves is determined by a random number, it is called a random walk. The random walk should not show any structure,
and should not repeat itself. If a part of the walk is repeated, this is likely because
its period is small. On the other hand, it could be that the random numbers are
not repeated, but only the order in which they fall into the set intervals. This is
also a sign of bad randomness, because it means that not all intervals have equal
chance of occuring when the previous random number is known.
I have depicted the random walks for the generators of Table 1 in Figure 2 on the
next page. We notice that the Noname and Bad generators have repetitions. For
the Bad generator, the period can be easily seen by inspecting the random number
sequence: every 32nd number is the same. For the Noname generator, the period
is much larger. We also notice that the walks for the GSL and Sun generators are
the same. This is because the 16 most significant bits are the same, and division
by their maximum will put these numbers in the same interval.
Figure 2. Random walks of the RNGs of Table 1. The seed was set to 1, and
10,000,000 random numbers were generated for each generator. Left, from top to
bottom: RANDU, Park-Miller, Bad, GSL. Right: Quick, Noname, Sun, Standard.
2.4.2
Multidimensional Structure
The multidimensional structure can be shown by plotting the points (xi , xi+1 ), (xi+o ,
xi+o+1 ), ... for a sequence of random numbers xi . Here, the dimension d is 2, and
d o is the overlap of the points. This can also be plotted for 3 dimensions. In [1],
it is stated that all such points of a linear congruential generator lie on a number
of axes in kspace. This becomes bad for lower dimensions and lower number of
axes, because the autocorrelations for the random numbers is higher.
Upon inspection of the two dimensional plots of points, I have not found any
structure for all generators except for the Bad generator. Its two dimensional plot
is depicted in Figure 3 below. Its points lie on a clear rectangular lattice.
Figure 3. Two dimensional plot of numbers from the Bad generator with overlap 1.
When inspecting the three dimensional structure, it is important to look from all
angles. This is seen when considering the plot of the numbers from RANDU in
Figure 4 below. Only from certain angles, the three dimensional structure can be
seen clearly. The points of RANDU appear to lie on a number of parallel planes. I
have found this same structure for the Noname generator. For the other generators,
I did not find any structure.
Figure 4. Two sides of the same three dimensional plots of numbers from the
RANDU generator with overlap 1.
10
2.4.3
Bit Correlation
11
2.4.4
Chi-square Test
Figure 5. Histograms of the uniformly distributed random numbers from the ParkMiller generator. Left: 1000 numbers. Right: 1000000 numbers
What is a good criterium for determining whether a distribution is good or bad
and how is the dependence on n and the bin size M incorporated? For this, we will
consider the 2 -distribution of the deviation V from the bin counts. This requires
many different V s, for an accurate representation. V is given by
V =
M
#
(yj Ej )2
Ej
j=1
(7)
n
. Now we take
Where Ej is the expectation value of the bin count which equals M
many V s and make sure that we use different sequences of random numbers for each
V . These V s have a 2 -distribution with M 1 degrees of freedom. This distribution
is widely used in statistics to provide cut off values for a certain confidence interval.
We take a two-sided confidence interval of 95 percent, which means that we have
an x1 and x2 between which is 95 percent of the area of the 2 -distribution.
I have tested all generators with 10000 different V s. Each V divides up another
sequence of 1000 random numbers over M = 20 bins. I have counted the fraction
of V s lying outside the confidence interval [x1 , x2 ]. These fractions are depicted in
Table 2 below.
12
2.5
Conclusion
The following generators have showed a lack of randomness. The generators Noname
and Bad have a low period as shown by the random walk. The generators RANDU,
Noname, and Bad show correlations between points in multi-dimensional space.
The generators RANDA, QUICK, Noname, Bad, and GSL showed correlations
between bits of sequential random numbers. The Bad generator showed a clear non
2 -distribution.
This leaves the Park-Miller, Sun, and Standard generators unblamished. According to [4], the Park-Miller generator has full period and thus has 23 1 2 different points. The Sun generator has a maximum of 215 different numbers, a factor
216 less than Park-Miller. For the Standard generator, however, we do not know
if it has full period. For this reason, I have chosen the Park-Miller generator for
performing Monte Carlo integration.
We can now use our RNGs from Section 2 for an application: Monte Carlo integration. This is the statistical approximation of an integral of a function over a certain
domain. The simplest Monte Carlo method is the hit-or-miss integration of a one
13
dimensional function. This is in fact much like the rejection method for generating
a non-uniform distribution. We sample points in area of which the size is known
and which contains the domain and codomain of the function over which we want
to integrate. Then for each point it is determined whether it is under the graph
of the function. The ratio of these points times the area in which the points were
sampled will give an estimation of the integral.
The main advantage of Monte Carlo methods for integration is their relative
simplicity in multiple dimensions. It can be used to estimate integrals over complicated or weird shaped volumes, as long as a simple second volume can be found
that encloses the first. A second advantage is the good convergence of the error of
Monte Carlo integration.
Here we will use Monte Carlo methods to estimate the integral or the volume of
a d dimensional unit sphere. We will start off with the estimation for d = 2, which
is a circle.
3.1
3.1.1
The area of a two dimensional unit circle centered at the origin equals four times
the area of this circle in the upper right plane (x, y 0). Consequently, we can
use two uniformly distributed variables on [0, 1] for the hit-or-miss integration of
a function f (x). Generate two such random numbers x1 and y1 . The probability
that the point [x1 , y1] is under the graph of the function f (x) equals the area of
f (x) in this domain and codomain. For n such random points, call na the number
of points under the graph of f (x). The ratio nna is an estimate for the area under
f (x). For n , this estimation will be exact. We would like to know how the
error decreases as we increase n.
As argued by [3], na has a binomial distribution for which the expected value E
of the (absolute) error is derived to be:
$
E(f )(1 E(f ))
na
(8)
E(| E(f )|)
n
n
Where E(| nna E(f )|) is the expected value of the absolute error, or the difference
between the estimation and the actual value of the integral. E(f ) is actually the
expected value of f . On the domain [a, b], this expected value or the average of f
can be used to calculate the integral as the product of width times average height:
it is (b a)E(f ). Here b = 1 and a = 0, so E(f ) equals the integral.
In general, the expected value of f is unknown, else we would not be performing
the Monte Carle integration. It is however only necessary to note that the error
1
decreases by the order O(n 2 ). Increasing the number of random points by 4 leads
to a reduction of factor 2 in the error.
3.1.2
Simple Sampling
We discussed in the previous section that the integral of f (x) on [a, b] equals (b
a)E(f ). Why not estimate E(f ) directly? This is what the simple sampling Monte
14
1#
f (xi )
E(f ) =
n i=1
(9)
This holds for n , else this is just another estimation. The expected value
of the error of the simple sample method is proven by [3] to be:
$
n
Var(f )
1#
(10)
E(
f (xi ) E(f ))
n i=1
n
Again, we do not know Var(f ), but we note that the error again decreases by
1
order O(n 2 ).
3.1.3
Area of a Circle
Take the function f (x) = 1 x2 on [0, 1]. It is the quarter of the unit circle in
the upper right plane. The integral I for this function is given by:
! 1
! 1
I=
f (x)dx =
1 x2 dx
0
=
4
I have approximated this integral by calculating nna of the hit-or-miss method,
and E(f ) from Equation (9). I have calculated the absolute difference of these
estimates with I for several values and plotted them in Figure 6 below. I have used
the Park-Miller generator with seed 1. I have also plotted the error of the hit-ormiss method given by Equation (8). The error of simple sample also decreases with
1
order O(n 2 ). It should therefore be parallel to the error of hit-or-miss.
15
2^5
Error
2^10
2^15
2^20
2^25
2^0
2^5
2^10
2^15
2^20
2^25
2^30
Number of points
Figure 6. A log-log plot, base 2, of the error against n. Blue plus signs: hit-or-miss
error. Red astrices: simple sample error. Blue line: the expected error for hit-ormiss from Equation (8)
The errors of both hit-or-miss and simple sample show a quite good reduction of
1
order O(n 2 ).
3.2
3.2.1
When f : [0, 1]d [0, 1], we can use uniform distributions of the components of the
points #xi . The total volume of the domain and codomain considered is 1d+1 = 1,
and the volume I under the graph of f is given by the ratio:
I=
na
n
16
(11)
Where n is the total number of random points generated. When calculating the
volume of the d-dimensional unit sphere, it should be noted that it is defined on
[1, 1]d . However, it is convenient to generate points from the uniform distribution
on [0, 1]. As a result, we only estimate a fraction of the volume. In the case of
the circle, this was a factor 14 of the total area. For a sphere it is a factor 18 of
the volume. For a d-dimensional unit sphere, we estimate a factor 21d of the total
volume. Therefore, we multiply Equation (11) by 2d to arrive at the estimate of the
total volume Vd of a d-dimensional unit sphere:
na
(12)
Vd = 2 d
n
The function f that is needed in order to check if a point is in the sphere or not, is
given by the Pythagoras formula in d 1 dimensions:
"
f (xi1 , xi2 , ..., xid1 ) = (xi1 )2 + (xi2 )2 +, ..., +(xid1 )2
(13)
Thus the criterium becomes that the point #xi lies inside the sphere if
"
"
(xi1 )2 + (xi2 )2 +, ..., +(xid1 )2 1 (xid )2
(14)
(15)
Discrepancy Sampling
xji = a1i Pi1 + a2i Pi2 + a3i Pi3 + a4i Pi4 + a5i Pi5 + ...
(17)
The discrepancy sampling is best seen in practice. In Figure 7 below, I have plotted
two dimensional discrepancy sampled points for n = 50 and n = 500. It can be
seen how the points form a regular structure and spread out over the plane.
17
I have estimated the volume of a ddimensional Unit Sphere using the hit-or-miss
Monte Carlo method and the discrepancy sampling quasi Monte Carlo method. I
have taken n = 226 points in each space, and calculated the error for every n equal
to a power of two less or equal to 226 and higher than 26 by using the analytical
value of the volume given by:
d
2
Vd = d
( 2 + 1
(18)
Where is the gamma function. The order of convergence of the error was then plotted using the third n-error point as a starting point. For the discrepancy sampling,
I have plotted both the order O(n1 log d (n)) convergence, as well as its approximation for small d: O(n1 ). These points and lines are depicted in Figure 8 below for
d = 2, 3, 6 and 12.
18
Figure 8. Four log-log plots, base 10, of the error of estimating the volume of a
ddimensional unit sphere against n. Blue dots: hit-or-miss error. Green dots:
discrepancy sampling error. Blue dotted line: expected error for hit-or-miss with
1
order O(n 2 ). Green dotted line: expected error for discrepancy sampling with order O(n1 log d (n)). Green solid line: expected error for discrepancy sampling with
order O(n1)
1
Observing Figure 8, the order O(n 2 ) of the convergence of the error of the hit-ormiss estimation shows quite well. The order O(n1 log d (n)) for discrepancy sampling
shows up less well. It seems for d = 6 and d = 12 that the error converges more
1
like O(n 2 ). For d = 2 and d = 3, the convergence of the error for discrepancy
sampling does seem faster than for the hit-or-miss method.
3.3
Conclusion
1
The expected convergence of the error with order O(n 2 ) of hit-or-miss Monte
Carlo integration shows up. Referring back to the introduction of this report,
this shows that our RNG, the Park-Miller generator, is a good RNG, because it
works. The expected convergence of the error of simple sampling Monte Carlo
1
integration of order O(n 2 ) also shows up. For discrepancy sampling Monte Carlo
19
integration, however, the error seems to converge slower than its expected order of
O(n1 log d (n)). I remain inconclusive about whether discrepancy sampling or hitor-miss Monte Carlo integration is better, although [1], [2], and [3] predict a faster
convergence of the error.
References
1. J. Gentle Random Number Generation and Monte Carlo Methods, SpringerVerlag, New York, 1998.
2. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes
in C: the art of scientific computing, second ed., Cambridge University Press,
1992.
3. B. Fagginger Auer, A. Yzelman, A. van Dam, and A. Swart Laborator Class
Scientific Computing: Course Book, Utrecht University, 2011.
4. S. Park and K. Miller Random Number Generators: Good ones are hard to
find, Communications of the ACM 31 (1988), no. 10, 1192-1201.
20