Академический Документы
Профессиональный Документы
Культура Документы
By
Gilberto E. Urroz, Ph.D., P.E.
Distributed by
i nfoClearinghouse.com
2001 Gilberto E. Urroz
All Rights Reserved
A "zip" file containing all of the programs in this document (and other
SCILAB documents at InfoClearinghouse.com) can be downloaded at the
following site:
http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scil
ab_Docs/ScilabBookFunctions.zip
The author's SCILAB web page can be accessed at:
http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html
Please report any errors in this document to: gurro@cc.usu.edu
Download at InfoClearinghouse.com 1 2001 Gilberto E. Urroz
PROBABILITY DISTRIBUTIONS 3
Discrete probability distributions 3
Bernoulli probability distribution 3
Binomial probability distribution 4
Poisson probability distribution: 5
Geometric probability distribution: 6
Hypergeometric probability mass function 7
Cumulative distribution functions for discrete probability distributions 9
SCILAB functions for discrete cumulative distribution functions 9
SCILAB function cdfbin 9
Discrete probability calculations through user-defined functions 10
Combinations 11
Binomial distribution 11
Poisson distribution 12
Geometric distribution 13
Hypergeometric distribution 14
Continuous probability functions 15
Factorials and the Gamma function 15
The gamma distribution 16
The exponential distribution 17
The beta distribution 17
The Weibull distribution 19
The uniform distribution 19
User-defined functions for continuous probability distributions 20
Continuous probability distributions used in statistical inference 25
The Normal distribution 25
The Student-t distribution 25
The Chi-squared (
2
) distribution 27
The F distribution 28
Applications of the normal distribution in data analysis 30
Plotting a histogram and its corresponding normal curve 31
Plotting data against their normal scores 34
The lognormal distribution 36
Generating synthetic data 38
Generating normally-distributed synthetic data 38
Additional applications of function rand 39
SCILAB function for generating synthetic data 40
Examples of synthetic data generation using function grand 41
Additional notes on function grand 49
Pseudo-random generators 50
Generating log-normally-distributed data 51
Generating data that follows the Weibull distribution 52
Generating data that follows the Students t distribution 53
Generating data that follows a discrete distribution 54
Download at InfoClearinghouse.com 2 2001 Gilberto E. Urroz
Statistical simulation 56
Simulating traffic through a service station 57
An user-defined function to simulate traffic through a service station 58
Modeling traffic through a service station with random input 60
STIXBOX: a rudimentary statistics toolbox 63
Exercises 72
Download at InfoClearinghouse.com 3 2001 Gilberto E. Urroz
Probability Distributions
There are a number of mathematical functions that possess the properties of a probability mass
function for discrete random variables or the properties of a probability density function for
continuous random variables. In this section we introduce a number of those functions for the
calculation of probabilities. Because these probability distributions depend on a finite number
of parameters they are typically referred to as parametric distributions.
Discrete probability distributions
Some of the most useful discrete probability distributions are the Bernoulli, Binomial, Poisson,
geometric, and hypergeometric distributions. The definitions of the corresponding probability
mass and distribution functions are shown below. We also present expressions for the mean,
variance, and standard deviation of these distributions.
Bernoulli probability distribution
The Bernoulli probability distribution applies to a discrete random variable that can only have
values of 0 or 1, i.e., X = 0, 1. Let the probability of X = 1 be p, i.e., f
X
(1) = p, then f
X
(0) = 1-p.
This can be summarized as
f
X
(x) = p
x
(1-p)
1-x
, x = 0,1
The mean value of the distribution is
X
= 0 (1-p) + 1 p = p.
The expectation of X
2
, E(X
2
), is needed to calculate the variance Var(X) = E(X
2
)-
X
2
. For the
Bernoulli distribution,
E(X
2
) = 0
2
(1-p) + 1
2
p = p,
and
Var(X) = E(X
2
)-
X
2
= p-p
2
= p(1-p).
Thus, the standard deviation is
X
= [p(1-p)]
1/2
.
These results can be obtained using SCILAB as follows:
-->p=poly(0,'p')
p =
p
-->X = [0,1]
X =
! 0. 1. !
-->Prob = [1-p p]
Prob =
Download at InfoClearinghouse.com 4 2001 Gilberto E. Urroz
! 1 - p p !
-->muX = X*Prob'
muX = p
-->EX2 = X^2*Prob'
EX2 = p
-->VarX = EX2 - muX^2
VarX =
2
p - p
The Bernoulli distribution applies to a simple binary experiments in which only two possible
outcomes exist: 1 or 0, yes or no, success or failure. The value of the probability of success,
p, can be obtained, for example, from the classical or from the frequency definitions of
probability. Bernoulli processes constitute the base of the binomial and geometric
distributions presented below.
Binomial probability distribution
If a Bernoulli experiment with success probability p is repeated n times, the probability of
having x successes out of the n trials is given by
1 0 , ,..., 2 , 1 , 0 , ) 1 (
) 1 ( ) 1 (
) 1 (
) 1 ( ) ( < <
+ +
+
,
_
p n x p p
r n r
n
p p
x
n
x f
x n x x n x
X
with
X
= np, Var(X) = np(1-p), and
x
= [np(1-p)]
1/2
.
In SCILAB, we can define the probability mass function for the Binomial distribution as
-->deff('[f]=fX(x,n,p)',
-->'f=gamma(n+1).*p.^x.*(1-p).^(n-x)./(gamma(x+1).*gamma(n-x+1))')
Next, we use this function to produce a plot of the probability mass function for n = 10, p =
0.10:
-->n=10; p=0.10; xx=[0:1:10]; yy = fX(xx);
-->xset('window',1);xset('mark',-9,2); plot2d(xx',yy',-9)
-->xtitle('Binomial pmf','x','fX(x)')
Download at InfoClearinghouse.com 5 2001 Gilberto E. Urroz
The following commands produce a plot of the cumulative distribution function:
-->yyy = [];for j = 1:n+1, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xtitle('Binomial cdf','x','FX(x)')
Poisson probability distribution:
If X is a Binomial variable with n and p 0, we calculate the parameter = np, and define
the Poisson probability mass function as
. 0 ; ,..., 2 , 1 , 0 ,
!
) ( >
x
x
e
x f
x
X
The Poisson pmf can be used to model the number of occurrences of a certain event in a given
time period or per unit length, area or volume, if represents the mean occurrence of the
even per unit time, length, area or volume, respectively.
The Poisson distribution has the parameters
X
= , Var(X) =
2
, and
x
= .
Download at InfoClearinghouse.com 6 2001 Gilberto E. Urroz
In SCILAB we can define the Poisson distribution pmf as:
-->deff('[p]=fX(x,lambda)','p=exp(-lambda).*lambda.^x./gamma(x+1)')
A plot of the pmf for = 2.5 for values of x between 0 and 20:
-->lambda = 2.5; xx = [0:1:20]; yy =fX(xx,lambda);
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xset('Poisson pmf','x','fX(x)')
A plot of the corresponding cumulative distribution function follows:
-->yyy = []; for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',9)
-->xtitle('Poisson cdf','x','FX(x)')
Geometric probability distribution:
Suppose that we have a Bernoulli experiment with probability of success p being repeated until
a successful outcome occurs. Let X represent the number of repetitions before a success, then
X can be modeled with the geometric pmf:
f
X
(x) = p(1-p)
x-1
, x = 1, 2, ,; 0<p<1.
The Poisson distribution has the parameters
X
= 1/p, Var(X) = (1-p)/p
2
, and
x
= (1-p)
1/2
/p.
The pmf for the geometric distribution and a plot of it is obtained in SCILAB by using:
-->deff('[f]=fX(p,x)','f=p*(1-p)^(x-1)')
-->p = 0.25; xx = [0:1:20]; yy = fX(p,xx);
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xtitle('geometric pmf','x','fX(x)')
Download at InfoClearinghouse.com 7 2001 Gilberto E. Urroz
A plot of the geometric distribution CDF is shown next:
-->yyy = [];for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xtitle('geometric cdf','x','FX(x)')
Hypergeometric probability mass function
Suppose that we have a finite population of N elements, out of which a < N elements are
defective. Suppose also that we take a sample of size n < N out of the population, and let X
represent the number of defective elements in the sample of size n. The probability of X is
given by the following pmf:
. ,..., 1 , 0 , 0 , 0 , ) , , , ( n x N a N n
n
N
x n
a N
x
a
N a n x f
X
< < < <
,
_
,
_
,
_
X
= na/N, Var(X) = na(N-a)(N-n)/(N
2
(N-1)).
To produce plots of the hypergeometric probability mass function and cumulative distribution
function, we first define a function accounting for the binomial coefficient:
-->deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))')
This function is incorporated in the definition of the hypergeometric function:
-->deff('[p]=fX(x)','p=C(a,x).*C(N-a,n-x)./C(N,n)')
Next, we produce plots of the hypergeometric pmf and CDF for N = 100, a = 25, and n = 20:
-->N=100;a=25;n=20;
-->xx=[0:1:20];yy=fX(xx);
-->xset('window',1);xset('mark',-9,2);
-->plot2d(xx',yy',-9);xtitle('Hypergeometric distribution','x','fX(x)');
-->yyy=[];for j=1:21, yyy=[yyy sum(yy(1:j))]; end;
-->xset('window',2);xset('mark',-9,2);
-->plot2d(xx',yyy',-9);xtitle('Hypergeometric distribution','x','FX(x)');
-->plot2d(xx',yyy',9)
Download at InfoClearinghouse.com 9 2001 Gilberto E. Urroz
Cumulative distribution functions for discrete
probability distributions
Out of the five probability distributions presented above, namely, Bernoulli, Binomial, Poisson,
geometric, and hypergeometric, three of them represent finite populations of discrete values
(Bernoulli, Binomial, hypergeometric) and two representing infinite populations (Poisson and
geometric). For the Binomial, Poisson, geometric, and hypergeometric functions, the
cumulative distribution function is calculated using
, ) ( ) (
0
x
k
X X
k f x F
where f
X
(x) represents the corresponding probability mass functions. (This is the definition
used to produce the CDF graphics shown in the previous examples). The cumulative
distribution function F
X
(x) is defined in the same range of values of the discrete random
variable X.
For the geometric distribution, whose domain starts at x = 1, the corresponding expression is
,... 3 , 2 , 1 , ) 1 ( ) ( ) (
1
1
1
x p p k f x F
x
k
k
x
k
X X
SCILAB functions for discrete cumulative distribution functions
SCILAB provides a number of functions for operations with cumulative distribution functions.
For discrete distributions the following functions are provided:
cdfbin - Binomial distribution
cdfnbn - Negative binomial distribution
cdfpoi - Poisson distribution (described in detail in Chapter )
Information on these functions can be obtained by using the help function. Next, we describe
the use of function cdfbin.
SCILAB function cdfbin
There four different forms of the call to function cdfbin:
[P,Q]=cdfbin("PQ",S,Xn,Pr,Ompr)
[S]=cdfbin("S",Xn,Pr,Ompr,P,Q)
[Xn]=cdfbin("Xn",Pr,Ompr,P,Q,S)
[Pr,Ompr]=cdfbin("PrOmpr",P,Q,S,Xn)
The variable Pr in these calls represents the probability of success on any given trial that we
refer to as p in the definition of the Bernoulli pmf shown earlier. On the other hand, OmPr
represents 1-Pr (in some references this is referred to as q = 1 - p), i.e., the probability of
failure in a given trial. The variable P represents the probability P(XS), where X ~
Binomial(Xn,Pr), while Q = 1 - P.
Download at InfoClearinghouse.com 10 2001 Gilberto E. Urroz
The first argument in the calls to function cdfbin is a string that determines which variable is
being sought, according to:
PQ -calculate probabilities, P = P(XS) and Q = 1 - P
S -calculate the inverse CDF, i.e., calculate S from P = P(XS)
Xn -calculate the number of trials (n in the definition of the pdf)
PrOmpr - calculate the probability of success in any given trial (p in the pdf definition)
Care should be exercised in keeping the proper order of the variables in the calls to the
function.
Some examples follow:
-->n = 10; x = 6; p = 0.35; q = 1-p;
-->[P,Q] = cdfbin('PQ',x,n,p,q) //Calculating probabilities
Q =
.0260243
P =
.9739757
-->n=20;p=0.35;q=1-p;P=0.75;Q=1-P;
-->x = cdfbin("S",n,p,q,P,Q) //Calculating the inverse CDF
x =
7.9132062
-->[p,q] = cdfbin("PrOmpr",P,Q,x,n) //Calculating p and q = 1-p
q =
.7391494
p =
.2608506
Notes: Use help cdfnbn to learn more about the function that implements the negative
Binomial distribution. The function cdfpoi was described in detail in Chapter 13.
Discrete probability calculations through user-defined functions
Besides the few pre-programmed cumulative distribution functions provided by SCILAB,
probabilities can be calculated by defining probability mass and cumulative distribution
functions for the different distributions presented earlier. The basic definitions of
probabilities in terms of probability mass and cumulative distribution functions are:
P(X=x) = f
X
(x), pmf
x
x
X
k f x X P
0
), ( ) ( cdf for Binomial, Poisson, and hypergeometric distributions
Download at InfoClearinghouse.com 11 2001 Gilberto E. Urroz
x
x
X
k f x X P
1
), ( ) ( cdf for geometric distribution
We will define the following functions for the distributions shown earlier:
pmf CDF
Binomial b(x,n,p) B(x,n,p)
Poisson p(x,lambda) P(x,lambda)
geometric g(x,p) G(x,p)
hypergeometric h(x,N,n,a) H(x,N,n,a)
The following is a SCILAB script, called DiscreteProbabilityFunctions, which includes the
definitions for the eight function calls listed in the table immediately above:
//Defining discrete probability distributions
deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))') //Binomial coefficient
deff('[bb]=b(x,n,p)','bb=C(n,x).*p.^x.*(1-p).^(n-x)') //Binomial pmf
deff('[BB]=B(x,n,p)','BB=sum(b([0:1:x],n,p))') //Binomial CDF
deff('[pp]=p(x,lambda)','pp=exp(-lambda).*lambda^x./gamma(x+1)') //Poisson pmf
deff('[PP]=P(x,lambda)','PP=sum(p([0:1:x],lambda))') //Poisson CDF
deff('[gg]=g(x,p)','gg=p.*(1-p).^(x-1)') //Geometric pmf
deff('[GG]=G(x,p)','GG=sum(g([1:x],p))') //Geometric CDF
deff('[hh]=h(x,N,n,a)','hh=C(a,x).*C(N-a,n-x)./C(N,n)') //Hypergeometric pmf
deff('[HH]=H(x,N,n,a)','HH=sum(h([0:1:x],N,n,a))') //Hypergeometric CDF
To execute the script that defines the discrete probability functions use:
-->exec('DiscreteProbabilityFunctions')
Combinations
The function C(n,r) represents combinations of n elements taken r by r, or the binomial
coefficient:
-->C(10,5)
ans =
252.
This is a vector of values of C(n,r) for n = 10, and r = 0,1, , 10:
-->C10=[];for j=0:10,C10=[C10 C(10,j)]; end; C10
C10 =
! 1. 10. 45. 120. 210. 252. 210. 120. 45. 10.
1. !
Binomial distribution
For the binomial distribution with n = 10 and p = 0.25, the following call to function b(x,n,p)
calculates the probability P(X=2) = b(2,10,0.25):
-->b(2,10,0.25)
ans =
Download at InfoClearinghouse.com 12 2001 Gilberto E. Urroz
.2815676
The following is a list of values of the binomial pmf for n = 10, p = 0.25, for all possible values
of x = 0,1, , 10:
-->b10=[];for j=0:10,b10=[b10 b(j,10,0.25)]; end; b10
b10 =
column 1 to 7
! .0563135 .1877117 .2815676 .2502823 .145998 .0583992
.016222 !
column 8 to 11
! .0030899 .0003862 .0000286 9.537E-07 !
The binomial CDF for x = 2, n = 10, p = 0.25 is calculated with the following call to function
B(x,n,p). This value represents P(X2):
-->B(2,10,0.25)
ans =
.5255928
This value represents P(X>2) = 1 - P(X2):
-->1-B(2,10,0.25)
ans =
.4744072
The following is a list of values of the binomial CDF for n = 10, p = 0.25, for all values of x =
0,1, , 10:
-->B10=[];for j=0:10,B10=[B10 B(j,10,0.25)]; end; B10
B10 =
column 1 to 7
! .0563135 .2440252 .5255928 .7758751 .9218731 .9802723
.9964943 !
column 8 to 11
! .9995842 .9999704 .9999990 1. !
Poisson distribution
The pmf of the Poisson distribution can be used to calculate probabilities such as P(X=2) for =
5.2:
-->p(2,5.2)
ans =
.0745840
Download at InfoClearinghouse.com 13 2001 Gilberto E. Urroz
For P(X=6), the Poisson distribution with for = 5.2 produces:
-->p(6,5.2)
ans =
.1514803
The cumulative distribution function for the Poisson distribution, with for = 5.2, provides the
probability P(X6):
-->P(6,5.2)
ans =
.7323933
The following SCILAB commands produce a vector of values of the Poisson cdf for x = 0, 1, ,
10, and = 5.2:
-->P10=[];for j=1:10, P10=[P10 P(j,5.2)]; end; P10
P10 =
column 1 to 7
! .0342027 .1087867 .2380655 .406128 .580913 .7323933
.8449216 !
column 8 to 10
! .9180650 .9603256 .9823011 !
Geometric distribution
The probabilities P(X=3) and P(X=5) using the geometric distribution with p = 0.50 are
calculated as:
-->g(3,0.50)
ans =
.125
-->g(5,0.50)
ans =
.03125
The following example shows a way to calculate a vector of values of the geometric
distribution pmf for x = 1, 2, , 10:
-->g([1:10],0.5)
ans =
column 1 to 9
! .5 .25 .125 .0625 .03125 .015625 .0078125
.0039063 .0019531 !
column 10
Download at InfoClearinghouse.com 14 2001 Gilberto E. Urroz
! .0009766 !
The following evaluations of the geometric distribution cdf are used to calculate the
probabilities P(X6), P(X3), and P(X1), respectively:
-->G(6,0.5)
ans =
.984375
-->G(3,0.5)
ans =
.875
-->G(1,0.5)
ans =
.5
A vector of values of the geometric distribution CDF, with p = 0.5, is produced by using the
following commands:
-->G10=[];for j=1:10, G10=[G10 G(j,0.5)]; end; G10
G10 =
column 1 to 9
! .5 .75 .875 .9375 .96875 .984375 .9921875
.9960938 .9980469 !
column 10
! .9990234 !
Hypergeometric distribution
The next line assign values to the parameters N, n, and a in the hypergeometric distribution:
-->N=100;n=20;a=35;
The probability P(X=12) for the hyperbolic distribution with the parameters N, n, and a defined
above is calculated as:
-->h(12,N,n,a)
ans =
.0078581
The cumulative distribution function for the hypergeometric distribution for x = 12 is
calculated as follows:
-->H(12,N,n,a)
ans =
.9976693
Download at InfoClearinghouse.com 15 2001 Gilberto E. Urroz
The value just calculated represents the probability P(X12). The next statement generates a
vector of values of the hypergeometric pdf for x = 0, 1, 2, , 20:
-->h([0:20],N,n,a)
ans =
column 1 to 7
! .0000529 .0008046 .0055295 .0228093 .0633073 .1256018
.1847085 !
column 8 to 14
! .2060210 .1768671 .1179114 .0613139 .0248839 .0078581
.0019176 !
column 15 to 21
! .0003575 .0000501 .0000051 3.698E-07 1.761E-08 4.924E-10
6.060E-12 !
The next line produces a vector of values of the hypergeometric CDF:
-->H10=[];for j=1:10,H10=[H10 h(j,N,n,a)]; end; H10
H10 =
column 1 to 7
! .0008046 .0055295 .0228093 .0633073 .1256018 .1847085
.2060210 !
column 8 to 10
! .1768671 .1179114 .0613139 !
Continuous probability functions
In this section we describe several continuous probability distributions including the gamma,
exponential, beta, and Weibull distributions. Some of these distributions make use of the
Gamma function, (x), which is defined next.
__________________________________________________________________________________
Factorials and the Gamma function (see also Chapter 13)
The Gamma function is defined by
This function has the property that ,
() = (-1) (1), for > 1,
therefore, it can be related to the factorial of a number, i.e.,
dx e x
x
0
1
) (
x for
x
x x f
Download at InfoClearinghouse.com 17 2001 Gilberto E. Urroz
-->[P,Q]=cdfgam("PQ",0.5,2,3)
Q =
.5578254
P =
.4421746
The next call to function cdfgam calculates the inverse gamma function, i.e., the value of x for
P = P(X<x) where X follows the gamma distribution with = 2, = 3:
-->x=cdfgam('X',2,3,0.4,0.6)
x =
.4588071
The next call to the function is used to calculate the shape parameter, , given a probability P
= P(X<0.3) = 0.6, Q = 1-P = 0.4, with X following the gamma distribution with a scale parameter
= 2:
-->alpha = cdfgam('Shape',2,0.6,0.4,0.3)
alpha =
.7190660
The next call to function cdfgam calculates the scale parameter, , given a probability P =
(X<1.2) = 0.2, Q = 1-P = 0.8, with X following the gamma distribution with = 3:
-->beta = cdfgam('Scale',0.2,0.8,1.2,3)
beta =
1.2792035
The exponential distribution
The exponential distribution is the gamma distribution with = 1. Its pdf is given by
While its cdf is given by
F
X
(x) = 1 - exp(-x/), for x>0, >0.
Parameters of the exponential distribution include:
.
1
,
1
X X
The beta distribution
; 0 , 0 ), exp(
1
) ( > >
x
x
x f
X
Download at InfoClearinghouse.com 18 2001 Gilberto E. Urroz
The pdf for the beta distribution is given by
As in the case of the gamma distribution, the corresponding cdf for the beta distribution is
also given by an integral with no closed-form solution.
The parameters of the beta distribution include
. ) )( 1 (
) ( ,
2
+ + +
+
X Var
X
SCILAB provides function cdfbet for operations with the cumulative distribution function of the
beta distribution. Calls to the function are the following:
[P,Q]=cdfbet("PQ",X,Y,A,B)
[X,Y]=cdfbet("XY",A,B,P,Q)
[A]=cdfbet("A",B,P,Q,X,Y)
[B]=cdfbet("B",P,Q,X,Y,A)
In these calls P = P(XX<X), Y = 1 - X, Q = 1 - P, A, B are the parameters and of the beta
distribution.
Next, we present some applications of function cdfbet. The first example calculate the
probability P(X<0.35) for the beta distribution with = 2, = 3:
-->[P,Q]=cdfbet('PQ',0.35,1-0.35,2,3)
Q =
.5629813
P =
.4370187
An example that calculates the inverse function of the beta cdf, i.e., the value of x for which P
= P(X<x) = 0.75, for the beta distribution with = 3, = 5 is presented next:
-->[X,Y] = cdfbet("XY",3,5,0.75,1-0.75)
Y =
.5139030
X =
.4860970
The next two examples shows how to obtain the parameters a and b from the beta distribution
given values of X = 0.3, Y = 1-X = 0.7, P = P(X<0.3) = 0.4, and Q = 1-P = 0.6. In the first
application = 3.5, while in the second application = 1.5:
-->alpha = cdfbet("A",3.5,0.4,0.6,0.3,0.7)
alpha =
; 0 , 0 , 1 0 , ) 1 (
) ( ) (
) (
) (
1 1
> > < <
+
x x x x f
X
Download at InfoClearinghouse.com 19 2001 Gilberto E. Urroz
2.0459494
-->beta = cdfbet("B",0.6,0.4,0.8,0.2,1.5)
beta =
.7453948
The Weibull distribution
The pdf for the Weibull distribution is given by
While the corresponding cdf is given by
Parameters of this distribution are:
1
]
1
+ + +
)
1
1 ( )
2
1 ( ) ( ),
1
1 (
2 / 2 / 1
X Var
X
.
The uniform distribution
The uniform distribution for a continuous random variable is defined for values of X such that a
<x<b. The corresponding probability density function is given by
b x a
a b
x f
X
< <
,
1
) (
The cumulative distribution function is
b x a
a b
a x
x F
X
< <
, ) (
The parameters of the uniform distribution are:
.
12
) (
) ( ,
2
2
a b
X Var
b a
X
+
The following function definition implements the cumulative distribution function for the
uniform distribution in SCILAB:
-->deff('[FF]=FX(x)','FF=(x-a)/(b-a)')
For values of a = 2.5 and b = 3.2, we proceed to calculate some probabilities:
0 , 0 , 0 ), exp( ) (
1
> > >
x for x x x f
0 , 0 , 0 ), exp( 1 ) ( > > >
x for x x F
Download at InfoClearinghouse.com 20 2001 Gilberto E. Urroz
--> a = 2.5; b = 3.2;
First, we calculate P(X<2.7) = F
X
(2.7):
-->FX(2.7)
ans =
.2857143
Next, we calculate P(X>3) = 1 - P(X<3) = 1 - F
X
(3):
-->1-FX(3)
ans =
.2857143
The following example calculates P(2.8<X<3) = P(X<3) - P(X<2.8) = F
X
(3) - F
X
(2.8):
-->FX(3)-FX(2.8)
ans =
.2857143
User-defined functions for continuous probability distributions
The following SCILAB script defines the probability density function and the cumulative density
function for four selected continuous distributions: gamma, exponential, beta, and Weibull.
The script is called ContinuousProbabilityFunctions, and is invoked by using:
-->exec('ContinuousProbabilityFunctions')
The listing of the script is the following:
//Define selected continuous probability functions
deff('[gg]=gam(x,a,b)','gg=x.^(a-1).*exp(-x./b)./(b.^a.*gamma(a))')
deff('[GG]=GAM(x,a,b)','GG=intg(0,x,gam)')
deff('[ee]=eex(x,b)','ee=exp(-x./b)./b')
deff('[EE]=EEX(x,b)','EE=1-exp(-x./b)')
deff('[bb]=bet(x,a,b)',...
'bb=gamma(a+b).*x.^(a-1).*(1-x).^b./(gamma(a).*gamma(b))')
deff('[BB]=BET(x,a,b)','BB=intg(0,x,bet)')
deff('[ww]=w(x,a,b)','ww=a.*b.*x^(b-1).*exp(-a.*x.^b)')
deff('[WW]=W(x,a,b)','WW=1-exp(-a.*x.^b)')
The functions defined through the script are summarized in the following table:
pdf CDF
gamma gam(x,,) GAM(x,,)
exponential eex(x,) EEX(x,)
beta bet(x,,) BET(x,,)
Weibull w(x,,) W(x,,)
Applications of these functions follow, starting with the gamma distribution.
The gamma distribution
First, we plot the pdf of the distribution using = 2 and = 3:
Download at InfoClearinghouse.com 21 2001 Gilberto E. Urroz
-->xx=(0:0.1:20);yy=gam(xx,2,3);
-->plot(xx,yy,'x','fX(x)','gamma distribution')
A plot of the gamma distribution CDF for = 2 and = 3 is obtained by using:
-->yyy=[];for x=0:0.1:20, yyy=[yyy GAM(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','gamma distribution')
The CDF can be used to calculate probabilities. The next three lines calculate the following
probabilities P(X<5) = F
X
(5), P(6<X<11) = F
X
(11) - F
X
(5), and P(X>7.5) = 1 - P(X<7.5) = 1 - F
X
(7.5):
-->GAM(5,2,3)
ans = .4963317
-->GAM(11,2,3)-GAM(6,2,3)
ans = .2867187
-->1-GAM(7.5,2,3)
ans = .2872975
The exponential distribution
Download at InfoClearinghouse.com 22 2001 Gilberto E. Urroz
The following commands generate plots of the pdf and CDF for the exponential distribution
using = 2.5:
-->xx=(0:0.1:20);yy=eex(xx,2.5);
-->plot(xx,yy,'x','fX(x)','exponential distribution')
-->yyy=[];for x=0:0.1:20, yyy=[yyy EEX(x,2.5)]; end;
-->plot(xx,yyy,'x','FX(x)','exponential distribution')
The following probability calculations for the exponential distribution with = 2.5 are
presented next: P(X<6) = F
X
(6), P(X>4) = 1 - P(X<4) = 1 - F
X
(4), and P(4<X<6) = F
X
(6)-F
X
(4):
-->EEX(6,2.5)
ans =
.9092820
-->1-EEX(4,2.5)
ans =
.2018965
-->EEX(6,2.5)-EEX(4,2.5)
ans =
.1111786
The beta distribution
Download at InfoClearinghouse.com 23 2001 Gilberto E. Urroz
To plot the pdf and CDF of the beta distribution with = 2.5, = 3.5, we use:
-->xx=(0:0.05:1);yy=bet(xx,2.5,3.5);
-->plot(xx,yy,'x','fX(x)','beta distribution')
-->yyy=[];for x=0:0.05:1, yyy=[yyy BET(x,2.5,3.5)]; end;
-->plot(xx,yyy,'x','FX(x)','beta distribution')
The following probability calculations for the beta distribution with = 2.5 are presented next:
P(X<0.25) = F
X
(0.25), P(X>0.75) = 1 - P(X<0.75) = 1 - F
X
(4), and P(0.3<X<0.8) = F
X
(0.8)-F
X
(0.3):
-->BET(0.25,2.5,3.5)
ans =
.1737696
-->1-BET(0.75,2.5,3.5)
ans =
.4250376
-->BET(0.8,2.5,3.5)-BET(0.3,2.5,3.5)
ans =
.3428804
The Weibull distribution
Download at InfoClearinghouse.com 24 2001 Gilberto E. Urroz
Plots of the pdf and CDF for the Weibull distribution with = 2 and = 3 are obtained as
follows:
-->xx=(0:0.01:2);yy=w(xx,2,3);
-->plot(xx,yy,'x','fX(x)','Weibull distribution')
-->yyy=[];for x=0:0.01:2, yyy=[yyy W(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','Weibull distribution')
The following probability calculations for the Weibull distribution with = 2 and = 3 are
presented next: P(X<1.5) = F
X
(1.5), P(X>0.6) = 1 - P(X<0.6) = 1 - F
X
(4), and P(0.5<X<1.2) =
F
X
(0.8)-F
X
(0.3):
-->W(1.5,2,3)
ans =
.9988291
-->1-W(0.6,2,3)
ans =
.6492094
-->W(1.2,2,3)-W(0.5,2,3)
ans =
.7472451
Download at InfoClearinghouse.com 25 2001 Gilberto E. Urroz
Continuous probability distributions used in statistical inference
Statistical inference is the process by which sample data is used to provide information about
the population. Some of the products of statistical inference are the generation of confidence
intervals and the test of hypotheses for population parameters. There are a number of
continuous probability distributions of great utility in statistical inference. These are:
the standard normal distribution
the Students t distribution
the Chi-square (
2
) distribution
the F distribution
The probability density functions (pdf) for these distributions are presented below:
The Normal distribution
The expression for the normal distribution pdf is:
where is the mean, and
2
the variance of the distribution.
SCILAB provides function cdfnor for operations with the cumulative distribution function for the
normal distribution. Function cdfnor was presented in detail in Chapter . To find on-line
information on this function use the command:
-->help cdfnor
The Student-t distribution
The Student-t, or simply, the t-, distribution has one parameter , known as the degrees of
freedom. The probability density function (pdf) is given by
The following SCILAB commands can be used to plot the pdf for the Student t distribution with
-->deff('[f]=fT(t,nu)',...
-->'f=gamma((nu+1)./2).*(1+t.^2./nu).^(-(nu+1)/2)/(sqrt(%pi*nu)*gamma(nu/2))')
-->tt=[-4:0.1:4];ff=fT(tt,6);
-->plot(tt,ff,'t','fT(t)','Student t - nu = 6')
< < +
+
t
t
t f , ) 1 (
)
2
(
)
2
1
(
) (
2
1 2
],
2
) (
exp[
2
1
) (
2
2
x
x f
Download at InfoClearinghouse.com 26 2001 Gilberto E. Urroz
SCILAB provides function cdft for operations with the cumulative distribution function of the
Students t distribution. The calls to the function are as follows:
[P,Q]=cdft("PQ",T,Df)
[T]=cdft("T",Df,P,Q)
[Df]=cdft("Df",P,Q,T)
In these function calls, P = P(TT<T), Q = 1 - P, Df = degrees of freedom = , with TT ~ Student
t(Df).
-->[P,Q] = cdft("PQ",0.4,6) //Probability calculation
Q =
.3515041
P =
.6484959
-->t = cdft("T",8,0.45,1-0.45) //Inverse CDF calculation
t =
- .1297073
-->nu = cdft("Df",0.7,0.3,0.8) //Obtaining degrees of freedom
nu =
.7716700
A plot of the CDF for the Student t distribution can be produced using the following commands:
-->xx=[-4:0.1:4];
-->yy=[];for x=-4:0.1:4, yy=[yy cdft('PQ',x,6)]; end;
-->plot(xx,yy,'t','fX(t)','Student t - nu = 6')
Download at InfoClearinghouse.com 27 2001 Gilberto E. Urroz
The Chi-squared (
2
) distribution
The Chi-squared (
2
) distribution has one parameter , known as the degrees of freedom. The
probability distribution function (pdf) is given by
A plot of the pdf for the Chi-square distribution with = 10 can be obtained by using:
-->xx = [0:0.1:10];
-->yy=[];for x=0:0.1:10, yy=[yy cdfchi('PQ',x,4)]; end;
-->plot(xx,yy,'t','fX(t)','Chi-square - nu = 4')
SCILAB provides function cdfchi for operations with the cumulative distribution function of the
2
(chi-square) distribution. The calls to this function include:
[P,Q]=cdfchi("PQ",X,Df)
[X]=cdfchi("X",Df,P,Q);
[Df]=cdfchi("Df",P,Q,X)
0 , 0 ,
)
2
( 2
1
) (
2
1
2
2
> >
x e x x f
x
+
+
,
_
x
x
x f
X
The standard normal distribution has mean value = 0 and standard deviation = 1.
SCILAB provides function cdfnor for operations with the normal cumulative distribution
function. The different forms of the call to the function were presented in detail in Chapter$,
and are repeated here:
[p,q] = cdfnor(PQ,x,mu,sigma)
[x] = cdfnor(X,mu,sigma,p,q)
[mu] = cdfnor(Mean,sigma,p,q,x)
[sigma] = cdfnor(Std,p,q,x,mu)
where mu is the mean value (m), sigma is the standard deviation (s), p = P(X<x), and q = 1 - p =
P(X>x). The first argument in the different calls to cdfnor is a string that indicates the type of
result expected:
PQ - to request probabilities p and q
X - to request a value of the normal variable
Mean - to request the mean of the distribution
Download at InfoClearinghouse.com 31 2001 Gilberto E. Urroz
Std - to request the standard deviation of the distribution
Because the normal distribution is commonly found in the analysis of physical measurements, it
if often recommended that you check if your data set (your sample) follows the normal
distribution. In this section we present two graphical approaches for checking if your data
follows the normal distribution. The first consists of superimposing a normal distribution pdf,
based on the mean value and standard deviation of the sample, on top of the sample
histogram. The second approach consists in plotting the data against what is commonly known
their normal scores. The resulting graph is equivalent to plotting the data in a normal
probability paper, i.e., a paper with one scale representing the normal probability
corresponding to the data set. These two approaches are described next.
Plotting a histogram and its corresponding normal curve
The purpose of this plot is to visually check if the histogram of a sample, with a suitable
number of classes, matches a superimposed normal curve. For that purpose we propose the
following SCILAB user-defined function, histnorm:
function [chi2,cmark,fcount]=histnorm(x, xclass)
//This function calculates the frequency distribution
//for the data in (row) vector x according to the
//class boundaries contained in the (row) vector
//xclass. It also produces a histogram of the
//data and the normal curve that best fit the data.
//
//Typical call: [chi2,cm,f] = freqdist(x,xclass)
//where cm = class marks, f = frequency count,
// chi2 = chi-square parameter for the fitting
[m n] = size(x); //Sample size
[m nB] = size(xclass); //Number of class boundaries
k = nB - 1; //Number of classes
//Calculate class marks
cmark = zeros(1,k);
for ii = 1:k
cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));
end
//Initialize frequency counts to zero
fcount=zeros(1,k);
fbelow=0; fabove=0;
//Accumulate frequency counts
for ii = 1:n
if x(ii) < xclass(1)
fbelow = fbelow + 1;
elseif x(ii) > xclass(nB)
fabove = fabove + 1;
else
for jj = 1:k
if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1)
fcount(jj) = fcount(jj) +1;
end
end
end
end
//define normal CDF, calculate xbar, sx, chi-square parameter
nn = sum(fcount);
xbar = mean(x); sx = st_deviation(x);
xmin = min(xclass); xmax = max(xclass);
Download at InfoClearinghouse.com 32 2001 Gilberto E. Urroz
pk = [];
for j = 1:k+1
pk = [pk cdfnor("PQ",xclass(j),xbar,sx)];
end;
p_in_classes = pk(k+1)-pk(1);
pxclass = pk(2:k+1) - pk(1:k);
fc = pxclass*nn*p_in_classes;
//Chi square parameter
chi2=0;
for j = 1:length(fc)
chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);
end;
//Produce normal distribution for data
Dx = (xmax-xmin)/100;
xx = [xmin:Dx:xmax];
xxx = xx(1:100) + Dx/2;
pkk = [];
for j = 1:101
pkk = [pkk cdfnor("PQ",xx(j),xbar,sx)];
end;
pp = pkk(2:101) - pkk(1:100);
fcc = pp*p_in_classes*nn*100/k;
//Determine plot rectangle
ymin = 0;
ymaxf = max(fcount); ymaxy = max(fcc);
ymax = max(ymaxf,ymaxy);
ymax = int(1.1*ymax);
plotrectangle = [xmin ymin xmax ymax];
//plot the histogram and normal curve
xp = xclass(1:k);
xset('window',1);xbasc(1);
plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);
plot2d3('onn',xp',fcount',[1],'000');
plot2d(xxx',fcc',[2],'000');
xtitle('Histogram with normal curve','x','frequency');
//end function histnorm
Notice that this function uses SCILAB function cdfnor to calculate values of the cumulative
distribution function for the normal distribution where needed. The general call to the
function is:
[chi2,cm,f] = freqdist(x,xclass)
which returns, in general, the class marks, cm, the frequency count, f, and a chi-square
parameter defined as
k
i i
i i
fc
fc f
1
2
2
,
) (
where f
i
is the actual frequency count for the ith class, fc
i
is the estimated frequency count
obtained from the normal distribution for the ith class, and k is the number of classes in the
frequency distribution.
The
2
parameter follows the chi-square distribution with = k-1 degrees of freedom, and it is
used to check the hypothesis that the frequency distribution under consideration follows
indeed the normal distribution. The subject of hypothesis testing is developed in Chapter ,
therefore, we delay until then the use of the parameter returned from function histnorm.
Download at InfoClearinghouse.com 33 2001 Gilberto E. Urroz
Application of the function histnorm
In this example we apply function histnorm to a set of 200 data values between 0 and 100
generated using function rand as follows:
-->x = int(100*rand(1,200));
First, we check the minimum and maximum value of the data:
-->min(x), max(x)
ans =
0.
ans =
99.
A set of class boundaries of 0, 10, 20, , 100, will produce 10 classes for this sample:
-->xclass = [0:10:100];
Next, we load the function histnorm and apply the function to the data stored in x using the
class boundaries stored in xclass
-->getf(histnorm)
-->histnorm(x,xclass)
ans =
1.9583514
The value returned is the chi-square parameter for the normal curve fitting. The plot of the
histogram with the super-imposed normal curve is:
A second example for the same data sample is presented next in which we use 20 classes, with
class boundaries 0, 5, 10, , 95, 100, to classify the data:
-->xclass=[0:5:100];
The results from function histnorm are the chi-square parameter and the following plot:
-->histnorm(x,xclass)
ans =
2.0146916
Download at InfoClearinghouse.com 34 2001 Gilberto E. Urroz
The function can be invoked with a vector of three values in the left-hand side to produce not
only the chi-square parameter and the plot, but also the class marks and the frequency count
of the sample:
-->[X2,cm,f] = histnorm(x,[0:10:100])
f =
column 1 to 9
! 20. 18. 27. 18. 23. 22. 16. 18. 14. !
column 10
! 24. !
cm =
! 5. 15. 25. 35. 45. 55. 65. 75. 85. 95. !
X2 =
1.9583514
Notice that in the two graphs shown above, the normal curve does not fit the histograms very
well. The main reason is that the data was generated from an uniform distribution (i.e., using
the default settings of SCILABs function rand) and not from a normal distribution. Later in
this chapter we deal with the generation of data other than from an uniform distribution, and
will be using function histnorm to check how well those data fit the normal distribution.
Plotting data against their normal scores
Assume that the continuous random variable X follows the normal distribution with mean and
standard deviation . Given a probability p (0<p<1) such that P(X<x)=p with X ~ N(,), then
the value of x is referred to as the normal score for p. [Note: In some references in the
statistical literature the normal scores are related to a probability = 1 - p, so that if P(X>x
) =
, with X ~ N(,), x
,
_
x
x
x
x f
X
X
X
X
with
Download at InfoClearinghouse.com 37 2001 Gilberto E. Urroz
( ) ). 2 exp( 1 ) exp( ) exp( ) ( ,
2
1
exp
) ln(
2
) ln(
2
) ln(
2
) ln( ) ln( X X X X X X
X Var
,
_
+
For calculating probabilities we can use the normal distribution cdf by first calculating the
natural log of the variable, for example, if X~lognormal(
ln(X)
=1.2,
ln(X)
=0.5), to calculate the
probability P(X<2) use P(X<2) = P(ln(X)<ln(2)) = P(Y<0.6931) where Y ~ N(1.2, 0.5). We can
use function cdfnor to calculate this probability in SCILAB as follows:
-->cdfnor(PQ,log(2),1.2,0.5)
ans =
.1553616
Suppose that we want to find the inverse cumulative distribution function, i.e., a value of X
for which P(X<x) = 0.35, given
ln(X)
=1.2,
ln(X)
=0.5, we can use:
-->cdfnor(X,1.2,0.5,0.35,0.65)
ans =
1.0073398
The previous result actually gives a value of Y = ln(X) with Y ~ N(1.2, 0.5). The corresponding
value of X is calculated as X = exp(Y), i.e.,
-->exp(ans)
ans =
2.7383068
A graph of the lognormal probability density function for
ln(X)
=1.2,
ln(X)
=0.5 is produced by
using:
-->deff([ff]=fX(x,mu,sigma),...
-->ff=exp(-(log(x)-mu).^2./(2.*sigma.^2))./(sigma.*x.*sqrt(2.*%pi)))
-->mu=1.2;sigma=0.5;xx=[0.01:0.1:10];yy=fX(xx,mu,sigma);
-->plot(xx,yy,x,fX(x),Log-normal pdf)
Download at InfoClearinghouse.com 38 2001 Gilberto E. Urroz
Generating synthetic data
In this section we present pre-defined and user-defined functions that allows us to generate
data that follows a particular probability distribution. We refer to such data as synthetic
data.
Generating normally-distributed synthetic data
In the examples presented in the previous section on applications of the normal distribution we
generated data by using the function rand, which, by default, produces random data uniformly
distributed in the interval [0,1]. The function rand can also be used to produce normally
distributed data, z, that follows the standard normal distribution, i.e., Z ~ N(0,1), by, first,
using the function call
rand(normal)
and next using the function call
rand(n,m)
where n and m are integers. The last call to function rand will produce a matrix of n rows and
m columns whose elements are random numbers following the standard normal function.
Recalling that the standardized normal variate is defined as
Z = (X-)/,
values of x can be obtained from values of z by using
x = + z.
The following example illustrate how to use function rand to produce 200 data points that
follow the normal distribution with mean = 150, and standard deviation = 50:
-->x = 150 + 50.*rand(1,200);
To verify that the data do indeed follow the normal distribution, we use functions histnorm and
normplot applied to this data set. To use function histnorm, we first determine the minimum
and maximum values of the data set to determine which class boundaries use in the histogram:
-->xmin = min(x), xmax = max(x)
xmin = 34.558873
xmax = 317.59609
We select for class boundaries the values 25, 50, 75, , 300, 325:
-->xclass = [25:25:325];
The resulting histogram and superimposed normal curve are shown next:
-->histnorm(x,xclass);
Download at InfoClearinghouse.com 39 2001 Gilberto E. Urroz
The fitting of the histogram to the corresponding normal curve is relatively good, in spite of
the apparent discrepancy towards the center of the data. We can also use function normplot
to check the normality of the data as follows:
-->normplot(x)
The resulting normal probability plot is:
The plot suggests that the data follows the normal distribution for most of the range except for
values larger than about 220.
Additional applications of function rand
SCILABs function rand, as most numerical random number generators, uses a number, known
as the seed, to produce random numbers. To find out the current value of the seed in
function rand use:
-->rand(seed)
ans = 8.096E+08
To find out which type of random number generator is active in function rand (i.e., normal or
uniform) use:
-->rand(info)
ans = normal
Download at InfoClearinghouse.com 40 2001 Gilberto E. Urroz
To change the function rand back to uniform use:
-->rand(uniform)
To change the seed to the number 15, for example, use:
-->rand(seed,15)
The first 10 random numbers generated by rand after seeding it with a value of 15 are:
-->rand(1,10)
ans =
column 1 to 5
! .1018111 .5348560 .9628528 .1235873 .6667947 !
column 6 to 10
! .4106913 .6578733 .6756193 .1201851 .0268646 !
After generating those 10 random numbers the value of seed has changed to:
-->rand(seed)
ans =
57691269.
If, for some reason, you need to re-start the previous sequence of random numbers, you can
simply re-seed function rand with the value of 15:
-->rand(seed,15)
Check that you get the same sequence of random numbers by comparing the following 5
random numbers with the first 5 random numbers generated earlier after using seed = 15:
-->rand(1,5)
ans =
! .1018111 .5348560 .9628528 .1235873 .6667947 !
SCILAB function for generating synthetic data
SCILAB provides function grand (generating random numbers) to generate a vector or matrix
with data that follows, among others, the following distributions: binomial, Poisson, gamma,
beta, exponential, uniform integer, uniform real, normal, chi-squared, and Students t. Two
general calls to the function are:
[x] = grand(m,n,dist_type,dist_parameters)
[x] = grand(A,dist_type,dist_parameters)
where dist_type is a string identifying the type of distribution, and dist_parameters is a list of
the parameters defining the distribution. In the first form of the call the values m and n
represent the number of rows and columns of a matrix to be generated containing random
numbers that follow the desired distribution. In the second form of the function call an
existing matrix A is provided so that the function generates a new matrix with the same
dimensions as A containing the random numbers that follow the desired distribution.
Download at InfoClearinghouse.com 41 2001 Gilberto E. Urroz
The following strings identify the type of distribution requested. We also identify the
parameters required for each distribution:
String Distribution Parameters
bin Binomial N, P
poi Poisson
bet Beta ,
gam Gamma = shape, = scale
exp exponential 1/
nor normal ,
chi chi-square
f F N, D
uin uniform integer a, b
unf uniform real a, b
The specific function calls for each probability distribution are shown next:
Binomial: x=grand(m,n,bin,N,P), x=grand(A,bin,N,P)
Poisson: x=grand(m,n,poi,mu), x=grand(x,poi,)
Beta: x=grand(m,n,bet,,), x=grand(A,bet, ,)
Gamma: x=grand(m,n,gam, ,), x=grand(A,gam, ,)
Exponential: x=grand(m,n,exp,), x=grand(A,exp,)
Normal: x=grand(m,n,nor,, ), x=grand(A,nor, , )
Chi-square: x=grand(m,n,chi,), x=grand(A,chi, )
F-distribution: x=grand(m,n,f, N, D), x=grand(A,f, N, D)
Uniform integer: x=grand(m,n,uin, ,), x=grand(x,uin, a, b)
Uniform real: x=grand(m,n,unf, ,),x=grand(x,unf, a, b)
Examples of synthetic data generation using function grand
The following examples demonstrate how to use function grand to generate sets of 200 data
points that follow specific probability distributions. After the data are generated we
determine their maximum and minimum values, select class boundaries for histograms of the
data, and use functions histnorm and normplot to check how close the data are to normality.
We start the exercises by loading these two functions:
-->getf(histnorm);getf(normplot);
Binomial data
-->x=grand(1,200,bin,20,0.35);xmin=min(x),xmax=max(x)
xmin = 2.
Download at InfoClearinghouse.com 42 2001 Gilberto E. Urroz
xmax = 14.
-->xclass=[2:2:14];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Poisson data
-->x=grand(1,200,poi,12.5);xmin=min(x),xmax=max(x)
xmin =
4.
xmax =
23.
-->xclass=[4:2:24];xset(window,1);histnorm(x,xclass);
Download at InfoClearinghouse.com 43 2001 Gilberto E. Urroz
-->xset(window,2);normplot(x);
Beta data
-->x=grand(1,200,bet,2,3);xmin=min(x),xmax=max(x)
xmin = .0480813
xmax = .9132797
-->xclass=[0:0.1:1];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Download at InfoClearinghouse.com 44 2001 Gilberto E. Urroz
Gamma data
-->x=grand(1,200,gam,2,3);xmin=min(x),xmax=max(x)
xmin = .0042184
xmax = 2.6455776
-->xclass=[0:0.4:2.8];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Download at InfoClearinghouse.com 45 2001 Gilberto E. Urroz
Normal data
-->x=grand(1,200,nor,2500,1250);xmin=min(x),xmax=max(x)
xmin =
1294.6718
xmax =
6467.2541
-->xclass=[-1000:1000:7000];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Chi-square data
-->x=grand(1,200,chi,12);xmin=min(x),xmax=max(x)
xmin =
3.8312405
xmax =
28.583772
-->xclass=[0:3:30];xset(window,1);histnorm(x,xclass);
Download at InfoClearinghouse.com 46 2001 Gilberto E. Urroz
-->xset(window,2);normplot(x);
F distribution data
-->x=grand(1,200,f,10,5);xmin=min(x),xmax=max(x)
xmin = .110966
xmax = 53.694396
-->xclass=[0:10:60];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Download at InfoClearinghouse.com 47 2001 Gilberto E. Urroz
-->xclass=[0:2:12];histnorm(x,xclass);
-->xclass=[0:0.5:6];histnorm(x,xclass);
Uniform integer data
-->x=grand(1,200,uin,-5,5);xmin=min(x),xmax=max(x)
xmin =
-5.
xmax =
5.
Download at InfoClearinghouse.com 48 2001 Gilberto E. Urroz
-->xclass=[-5:1:5];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Uniform real data
-->x=grand(1,200,unf,-5,5);xmin=min(x),xmax=max(x)
xmin =
-4.9677424
xmax =
4.9660118
-->xclass=[-5:1:5];xset(window,1);histnorm(x,xclass);
Download at InfoClearinghouse.com 49 2001 Gilberto E. Urroz
-->xset(window,2);normplot(x);
Additional notes on function grand
The previous examples were used to illustrate applications of function grand to the generation
of data that follows the binomial, Poisson, gamma, beta, exponential, normal, chi-square, F-,
uniform integer, and uniform real distributions. Function grand allows the user to obtain data
that follow other distributions that are not presented in this book, such as the negative
binomial distribution, the multinomial distribution, the non-central F distribution, and the non-
central chi-square distribution. (To find information about these and other distributions
consult a statistics and probability textbook such as Spanos, A., 1999, Probability Theory and
Statistical Inference - Econometric Modeling with Observational Data, Cambridge University
Press, Cambridge, U.K.).
To obtain additional details on the use of function grand use:
-->help grand
Function grand has access to 32 different random number generators that constitute the basis
upon which random numbers that follow a particular probability distribution are generated. By
default, functions rand and grand use generator number 1. To check out which is the current
active random number generator use:
-->grand(getcgn)
ans =
1.
This result indicates that you are currently using SCILABs default random number generator.
The random number generators provided by SCILAB for use with function grand require two
seed numbers. To see the current seed numbers you can use the statement:
-->seeds = grand(getsd)
seeds =
1.0E+08 *
! 20.45933 9.2172801 !
You can re-initialize those seed to the original seeds by using:
Download at InfoClearinghouse.com 50 2001 Gilberto E. Urroz
-->grand(initgn,-1)
ans =
1.
We can check the initial seeds after re-initialization by using:
-->seeds = grand(getsd)
seeds =
1.0E+08 *
! 12.345679 1.2345679 !
You can also re-seed the generator (i.e., provide new seeds) by using the following call to
function grand:
-->grand(setall,10,20)
ans =
setall
To check that the new seeds are active use:
-->seeds=grand(getsd)
seeds =
! 10. 20. !
To change the random number generator from generator number 1 to generator number 5, for
example, use:
-->grand(setcgn,5)
ans =
5.
The following call to function grand can be used to verify that the change of generator has
been made:
-->grand(getcgn)
ans =
5.
To check the values of the seeds for the current generator use:
-->seeds=grand(getsd)
seeds =
! 3.795E+08 77757764. !
Pseudo-random generators
The random number generators used in SCILAB and other computer applications are known as
pseudo-random generators because, after generating a sufficiently long sequence of numbers,
the numbers start repeating. Therefore, they are not strictly random generators, but only
quasi-random or pseudo-random.
The random number generator provided with SCILAB is able to produce 2.310
18
numbers
before repetition of numbers occurs. This collection of numbers is partitioned into 32 pseudo-
random generators, each containing 2
20
=1,048,576 blocks of non-overlapping random numbers.
Each block is 2
30
= 1,073,741,824 in length.
Download at InfoClearinghouse.com 51 2001 Gilberto E. Urroz
Given the size of the sequences of random numbers that can be generated with each of
SCILABs 32 pseudo-random number generators, we are confident that the numbers thus
generated are random enough for most practical applications. Furthermore, use of the default
generator should be enough for most applications unless you
Another application of function grand is in the generation of permutations of a column vector.
For example, the following application produces 10 permutations of the vector M containing
the first five positive integers. The permutations are shown as columns of a matrix.
-->M = [1 2 3 4 5];
-->grand(10,prm,M)
ans =
! 1. 2. 4. 1. 4. 4. 5. 4. 1. 3. !
! 3. 1. 2. 4. 2. 2. 1. 3. 4. 2. !
! 2. 3. 5. 5. 5. 3. 2. 2. 2. 5. !
! 5. 4. 3. 3. 3. 5. 3. 1. 3. 4. !
! 4. 5. 1. 2. 1. 1. 4. 5. 5. 1. !
Generating log-normally-distributed data
To generate log-normally distributed data we first generate a set of normally distributed data
and then apply the exponential function to that data set. For example, if X follows the
lognormal distribution with
ln(X)
=1.2,
ln(X)
=0.5, we can use the following SCILAB commands to
generate a set of 200 data points. We apply functions histnorm and normplot to this data set
to check how close the data are to normality.
-->y=grand(1,200,nor,1.2,0.5); //Generate normal data N(1.2,0.5)
-->x=exp(y); //Generate log-normal data by using exp
-->xmin=min(x),xmax=max(x) //Determine min and max values
xmin =
1.1210567
xmax =
11.161347
-->xclass=[0:2:12];histnorm(x,xclass); //Histogram
-->normplot(x); //Normal probability plot
Download at InfoClearinghouse.com 52 2001 Gilberto E. Urroz
Generating data that follows the Weibull distribution
SCILAB does not provide for a function to generate data that follows the Weibull distribution,
however, using the uniformly-generated random numbers from function rand we can generate
numbers p between 0 and 1 that represent probabilities p = F
X
(x) = P(X<x). Next, we use the
cumulative distribution function for the Weibull distribution, namely,
and solve for x given values of p, i.e.,
.
) 1 ln(
/ 1
1
]
1
p
x
The following SCILAB commands are used to generate 200 data points that follow the Weibull
distribution with a =2, b = 3. We also use functions histnorm and normplot to check how close
these data are to normality.
-->getf(histnorm);getf(normplot) //Load functions
-->p=rand(1,200); //Generate probabilities
-->a=2; b=3; //parameters of Weibull distribution
-->x = (-log(1-p)/a)^(1/b); //generate Weibull data
-->xmin=min(x), xmax = max(x) //check data range
xmin =
.1230276
xmax =
1.3553315
-->xclass = [0:0.1:1.4]; //select classes for histogram
-->histnorm(x,xclass); //plot histogram and normal curve
0 , 0 , 0 ), exp( 1 ) ( > > >
x for x x F
Download at InfoClearinghouse.com 53 2001 Gilberto E. Urroz
-->normplot(x) //create normal probability plot
It is interesting to notice that this Weibull data is very close to normality.
Generating data that follows the Students t distribution
Function grand does not allow for the generation of data following the Students t distribution.
However, SCILAB provides for function cdft which lets you obtain the inverse of the cumulative
distribution. Using an approach similar to that shown above for the Weibull distribution, we
can generate random probability values through function rand, and then use function cdft to
generate the data required.
The following example illustrates the procedure:
-->getf(histnorm);getf(normplot); //Load functions histnorm & normplot
-->pp = rand(1,200); //Generate random probabilities
-->x = []; //This line and the for end
-->for j =1:200 //construct calculate values of x
--> x = [x cdft(T,6,pp(j),1-pp(j))];
-->end;
-->xmin=min(x), xmax=max(x) //Determine min & max values
xmin = 6.9441809
xmax = 3.4425429
-->xclass=[-7:1:4];xset(window,1);histnorm(x,xclass); //Histogram
Download at InfoClearinghouse.com 54 2001 Gilberto E. Urroz
-->xset(window,2);normplot(x); //Normal probability plot
Generating data that follows a discrete distribution
Using function grand we were able to generate discrete data that follows the binomial,
Poisson, and uniform integer distributions. In this section we present a general method for the
generation of data given a discrete distribution in the form of a table. For example, the
following table shows the probability mass function, f
x
(x) = P(X=x), and cumulative distribution
function, F
X
(x) = P(X<x), of a discrete random variable X:
Random numbers
X f
X
(x) F
X
(x) From to
0.5 0.10 0.10 0.00 0.10
1.5 0.25 0.35 0.10 0.35
2.5 0.20 0.55 0.35 0.55
3.5 0.15 0.70 0.55 0.70
4.5 0.15 0.85 0.70 0.85
5.5 0.15 1.00 0.85 1.00
The last two columns of the table represent the range of probabilities corresponding to the
cumulative distribution function for each value of X. The procedure for generating data
Download at InfoClearinghouse.com 55 2001 Gilberto E. Urroz
consists in obtaining a value of random probability p = P(X<x) from a uniform distribution, e.g.,
using function rand, and then assigning a value of X according to the range of values of the
random numbers. Thus, if function rand produces the random number 0.25, we assign to x the
corresponding value X = 1.5.
The following function, discrand, will generate a matrix of dimensions nm random numbers
given vectors of values of X and FX, representing the values of a discrete random variable and
its corresponding cumulative distribution function.
function [x] = discrand(n,m,xx,FX)
//A function to generate a matrix nxm
//following a discrete probability distribution
//represented by vectors xx and FX = P(X<xx)
nx = length(xx);
pp = rand(n,m);
x = zeros(n,m);
FXX = [0.00 FX];
for i = 1:n
for j = 1:m
for k = 1:nx
if pp(i,j)>FXX(k) & pp(i,j)<=FXX(k+1) then
x(i,j) = xx(k);
end;
end;
end;
end;
//end function discrand
An application of the function to generate 200 data points that follow the probability
distribution shown in the table above is presented next. We first load function discrand, then
enter the values of X and F
X
(x), and generate a row vector of 200 points. Next, we load
functions histnorm and normplot to check how well the data follows a normal distribution.
-->getf(discrand)
-->X = [0.5:1.0:5.5]; FX = [0.10,0.35,0.55,0.70,0.85,1.00];
-->x=discrand(1,200,X,FX);
-->getf(histnorm);getf(normplot);
-->xmin=min(x), xmax=max(x)
xmin =
.5
xmax =
5.5
-->xclass=[0.5:0.5:5.5];
-->histnorm(x,xclass)
ans =
24.643214
Download at InfoClearinghouse.com 56 2001 Gilberto E. Urroz
-->normplot(x)
Statistical simulation
Many physical or other type of systems are described by one or more mathematical
relationships (e.g., algebraic, difference, or differential equations) of diverse degrees of
complexity. We will refer to the set of mathematical relationships that describe a physical
system as a model. A model typically depends of certain constant values known as the
parameters of the model. In the simplest of cases, a model can be represented by a black box
into which a set of input data is provided, and from which a set of output results is obtained.
This is illustrated in the following figure:
If the model is such that for a given set of input data it always produces a predictable result, it
is referred to as a deterministic model. An example of a deterministic model is the equation
Download at InfoClearinghouse.com 57 2001 Gilberto E. Urroz
that describes the electric current, I, through a resistor, R, when a voltage, V, is applied across
the terminals of the resistor. The equation is
I = V/R.
If we apply a constant voltage V
o
to the resistor, we get back a constant electric current, I
0
=
V
o
/R. If we instead apply a variable voltage V(t) = V
o
sin(t), we obtain an electric current,
I(t) = (V
o
/R)sin(t). Thus, knowing the value of the resistance R and the input to the system,
i.e., the voltage, V
0
or V(t), we can always find the value of the electric current. We cannot
get more deterministic than this example.
If the input to the model is of a random nature, or if there is a random component to the
model itself, the model is said to be probabilistic or stochastic. For example, the black-box
model described above can be used to describe a hydrological basin. The input data is the
amount and duration of the precipitation falling on the basin on a certain period of time. (A
graphical representation of precipitation vs. time is referred to as a hyetograph). This input
is, by its own nature, random or stochastic. This means that we cannot know exactly the
amount of precipitation that will occur, say, in the next 24 hours.
Although a hydrological basin is extremely more complicated than an electric resistor, the
model used to predict the runoff (output) to the system can be a simple relationship involving
one or two parameters. (A graphical representation of the runoff coming out of the basin as a
function of time is known as a hydrograph). If the input hyetograph is known, then the output
hydrograph can be obtained in a deterministic way. However, because we do not know
exactly the input hyetograph for a particular period of time, except in a statistical manner, the
model is indeed a stochastic one.
Through the keeping of historical records of precipitation in the basin we can get a good idea
of the stochastic nature of precipitation to use as input for our stochastic model. We can then
generate synthetic data representing the precipitation and use it as input to the model. This
approach to modeling physical (or economical, or other type of) systems is known as a Monte
Carlo method. (The name derives from the capital of the European principalty of Monaco, the
city of Monte Carlo, famous for its casinos, where the laws of probability are seen in action
night and day.)
Monte Carlo methods find applicability in all types of models where there is a random
component to the input or parameters of the model. Statistical modeling can be used to
model, for example, economic responses from human populations, the distribution of soil
permeabilities in an aquifer, the distribution of animal or plant populations, traffic patterns in
highways or airports, weather phenomena, etc. A simple application of a Monte Carlo method
to simulate the patterns of traffic through a service station is shown below.
Simulating traffic through a service station
Suppose we want to simulate the traffic through a service station in which only one customer
can be serviced at a time. We also assume that once a customer arrives to the service station,
he or she will not leave until service is provided. This is a simplistic model, but it could be
used to simulate a vehicle service station in a city or highway, a medical emergency room, a
highway service station for state or privately own trucks, a store, etc.
The first customer arrives at a certain arrival time, AT
1
(Arrival Time). He or she is taken care
of right away so that the starting time of service for customer 1, ST
1
(Starting Time), coincides
with his or her arrival time, thus, ST
1
= AT
1
. The waiting time for customer 1 is, therefore,
zero, i.e., WT
1
= 0. The number of customers awaiting service at this point is also zero, i.e.,
Download at InfoClearinghouse.com 58 2001 Gilberto E. Urroz
NW
1
= 0. The time required to service this first customer is referred to as TS
1
(Time of
Service). The first customer leaves the service station at time ET
1
= ST
1
+ TS
1
(Ending Time).
The second customer arrives at the service station at a time AT
2
. If AT
2
< ET
1
(i.e., the second
customer arrives before service for the first one has finished), the second customer must wait
until the first customer leaves, so that ST
2
becomes ET
1
(ST
2
= ET
1
). In this case, we can
calculate a waiting time for the second customer equal to WT
2
= ET
1
- AT
2
. Also, the number of
customers waiting for service at this point is NW
2
= 1. If, instead, the second customer arrives
at a time AT
2
ET
1
, then ST
2
= AT
2
, and WT
2
= 0. In any event, the ending time for the second
customer is calculated as ET
2
= ST
2
+ TS
2
.
We define the inter-arrival time between customers 1 and 2 as IAT
1
= AT
2
- AT
1
. In general, the
inter-arrival time between customers i and i+1 is IAT
i
= AT
i+1
- AT
i
. The inter-arrival time (IAT
i
)
and the time of service (TS
i
) are considered random variables of discrete nature. Thus, IAT
i
and TS
i
constitute random input to the model.
Suppose that we want to simulate the operation of the service center for n customers, we first
generate n-1 values of inter-arrival time {IAT
1
, IAT
2
, , IAT
n-1
}, as well as n values of the
service time {TS
1
, TS
2
, , TS
n
}. Then, we proceed to calculate the arrival times as
AT
i+1
= AT
i
+ IAT
i
, i = 1, 2, , n-1.
As indicated earlier, the starting and ending times for the first customer are ST
1
= AT
1
, ET
1
=
ST
1
+ TS
1
. Also, the waiting time and number of customers waiting at this stage are both zero,
i.e., WT
1
= 0, and NW
1
= 0. The starting time for customer 2 is obtained as follows:
If AT
2
> ET
1
, then ST
2
= AT
2
, WT
2
= 0, NW
2
= 0
If AT
2
< ET
1
, then ST
2
= ET
1
, WT
2
= ET
1
- AT
2
, and NW
2
= 1.
For the third customer, we need to check the arrival time, AT
3
, against the ending times of
both the first and second customers so we can determine the starting time, the waiting time,
and the number of customers waiting at that point. The following piece of pseudo-code can
be used to determine such values:
for j = 2:n
NW
j
= 0
WT
j
= 0
for k = 1:j-1
if AT
j
< ET
k
then
NW
j
= NW
j
+ 1
WT
j
= ET
k
- AT
j
ST
j
= ET
k
else
ST
j
= AT
j
end
end
ET(
j
) = ST(
j
)+TS(
j
)
End
An user-defined function to simulate traffic through a service
station
The steps outlined above are put together in the following function, service:
function [MR] = service(IAT,TS)
Download at InfoClearinghouse.com 59 2001 Gilberto E. Urroz
//Simulation of traffic in a service station
//Given n-1 values of inter-arrival time IAT
//and n values of time of service TS.
//Results:
//Arrival time = AT, Starting time = ST
//Ending time = ET, Waiting time = WT
//Number of waiting customers = NW
//
n = length(TS);
AT = zeros(1,n);
ST = zeros(1,n);
ET = zeros(1,n);
NW = zeros(1,n);
WT = zeros(1,n);
IATT = [IAT 0];
ST(1) = AT(1);
ET(1) = ST(1) + TS(1);
for j = 2:n
AT(j) = AT(j-1) + IAT(j-1);
end;
for j = 2:n
NW(j) = 0;
WT(j) = 0;
for k = 1:j-1
if AT(j) < ET(k) then
NW(j) = NW(j) + 1;
WT(j) = ET(k) - AT(j);
ST(j) = ET(k);
else
ST(j) = AT(j);
end;
end;
ET(j) = ST(j)+TS(j);
end;
disp(' ');
printf('===============================================================\n');
printf(' j AT IAT ST TS ET WT NW \n');
printf('===============================================================\n');
for j = 1:n
printf('%3.0f %8.2f %8.2f %8.2f %8.2f %8.2f %8.2f %3.0f\n',...
j,AT(j),IATT(j),ST(j),TS(j),ET(j),WT(j),NW(j));
end;
printf('===============================================================\n');
MR = [AT' IATT' ST' TS' ET' WT' NW']; //Matrix of Results
printf('AT = arrival times IAT = inter-arrival times \n');
printf('ST = starting times TS = time of service \n');
printf('ET = ending times WT = waiting times \n');
printf('NW = number of customers waiting \n');
disp(' AT IAT ST TS ET WT NW');
//end function service
As an example, suppose that we have the following inter-arrival times (IAT) and times of
service (TS):
Download at InfoClearinghouse.com 60 2001 Gilberto E. Urroz
-->IAT = [ 0.5 0.75 0.5 0.25 0.5];
-->TS = [ 1 2 1 1 2 1];
We can load function service and run it with the values of IAT and TS defined earlier to obtain
the following results:
-->Matrix_of_results = service(IAT,TS)
===============================================================
j AT IAT ST TS ET WT NW
===============================================================
1 0.00 .50 0.00 1.00 1.00 0.00 0
2 .50 .75 1.00 2.00 3.00 .50 1
3 1.25 .50 3.00 1.00 4.00 1.75 1
4 1.75 .25 4.00 1.00 5.00 2.25 2
5 2.00 .50 5.00 2.00 7.00 3.00 3
6 2.50 0.00 7.00 1.00 8.00 4.50 4
===============================================================
AT = arrival times IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times WT = waiting times
NW = number of customers waiting
AT IAT ST TS ET WT NW
Matrix_of_results =
! 0. .5 0. 1. 1. 0. 0. !
! .5 .75 1. 2. 3. .5 1. !
! 1.25 .5 3. 1. 4. 1.75 1. !
! 1.75 .25 4. 1. 5. 2.25 2. !
! 2. .5 5. 2. 7. 3. 3. !
! 2.5 0. 7. 1. 8. 4.5 4. !
The function is designed to provide a table of results, as well as a matrix summarizing the
results in case that additional operations on those results are required within SCILAB. The
function, as applied in this case, is purely deterministic in the sense that for the given input we
get a unique result. To work out a stochastic modeling of traffic through a service station we
need to provide random input. The following example shows how to obtain that random input.
Modeling traffic through a service station with random input
Suppose that the inter-arrival times and time of service for the service station model follows
the probability distributions shown in the following table:
x = IAT F
X
(x) x = TS F
X
(x)
0.1 0.05 0.25 0.10
0.2 0.10 0.50 0.20
0.3 0.20 0.75 0.40
0.4 0.35 1.00 0.70
0.5 0.45 1.25 0.80
0.6 0.50 1.50 0.90
0.7 0.70 1.75 0.95
0.8 0.75 2.00 1.00
0.9 0.95
1.0 1.00
Download at InfoClearinghouse.com 61 2001 Gilberto E. Urroz
We want to analyze the traffic through the service station for 10 customers by generating 9
inter-arrival times and 10 service times from these generations. The inter-arrival times and
times of service can be generated using function discrand as follows:
-->getf('discrand')
-->xIAT = [0.1:0.1:1.0]; FIAT = [0.05,0.1,0.2,0.35,0.45,0.5,0.7,0.75,0.95,1.0];
-->xTS = [0.25:0.25:2]; FTS = [0.1,0.2,0.4,0.7,0.8,0.9,0.95,1];
-->IAT = discrand(1,9,xIAT,FIAT) //generate IAT data
IAT =
! .4 .7 .7 .5 .4 .7 .5 .9 .1 !
-->TS = discrand(1,10,xTS,FTS) //generate TS data
TS =
! 1. .75 1. .75 .5 1.25 .75 .5 1. .5 !
With these values of IAT and ST we now call function service:
-->M = service(IAT,TS)
===============================================================
j AT IAT ST TS ET WT NW
===============================================================
1 0.00 .40 0.00 1.00 1.00 0.00 0
2 .40 .70 1.00 .75 1.75 .60 1
3 1.10 .70 1.75 1.00 2.75 .65 1
4 1.80 .50 2.75 .75 3.50 .95 1
5 2.30 .40 3.50 .50 4.00 1.20 2
6 2.70 .70 4.00 1.25 5.25 1.30 3
7 3.40 .50 5.25 .75 6.00 1.85 3
8 3.90 .90 6.00 .50 6.50 2.10 3
9 4.80 .10 6.50 1.00 7.50 1.70 3
10 4.90 0.00 7.50 .50 8.00 2.60 4
===============================================================
AT = arrival times IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times WT = waiting times
NW = number of customers waiting
AT IAT ST TS ET WT NW
M =
! 0. .4 0. 1. 1. 0. 0. !
! .4 .7 1. .75 1.75 .6 1. !
! 1.1 .7 1.75 1. 2.75 .65 1. !
! 1.8 .5 2.75 .75 3.5 .95 1. !
! 2.3 .4 3.5 .5 4. 1.2 2. !
! 2.7 .7 4. 1.25 5.25 1.3 3. !
! 3.4 .5 5.25 .75 6. 1.85 3. !
! 3.9 .9 6. .5 6.5 2.1 3. !
! 4.8 .1 6.5 1. 7.5 1.7 3. !
! 4.9 0. 7.5 .5 8. 2.6 4. !
Out of the matrix of results, M, we can extract individual columns of data, for example, the
waiting time data corresponds to the sixth column of M:
Download at InfoClearinghouse.com 62 2001 Gilberto E. Urroz
-->WT = M(:,6)
WT =
! 0. !
! .6 !
! .65 !
! .95 !
! 1.2 !
! 1.3 !
! 1.85 !
! 2.1 !
! 1.7 !
! 2.6 !
The number of waiting customers is extracted from the seventh column of matrix M:
-->NW = M(:,7)
NW =
! 0. !
! 1. !
! 1. !
! 1. !
! 2. !
! 3. !
! 3. !
! 3. !
! 3. !
! 4. !
The columns of data extracted from the matrix of results, M, can be used to obtain statistics
such as the mean and standard deviation:
-->WT_mean = mean(WT), WT_sdev = st_deviation(WT)
WT_mean = 1.295
WT_sdev = .7836701
-->NW_mean = mean(NW), NW_sdev = st_deviation(NW)
NW_mean = 2.1
NW_sdev = 1.2866839
We can also function normplot to check how close the data is to normality:
-->getf('normplot')
-->normplot(NW')
Download at InfoClearinghouse.com 63 2001 Gilberto E. Urroz
-->normplot(WT')
STIXBOX: a rudimentary statistics toolbox
STIXBOX (an abbreviation of statistical toolbox) is a collection of functions that perform
selected statistical and probability calculations. STIXBOX is available for download from the
SCILAB main web page (http://www-rocq.inria.fr/SCILAB/). Instructions for its installation are
provided with the downloaded functions. The package includes a set of help manual pages
that briefly describe the operation of the functions. Once loaded, the manual pages are
available through the main SCILAB Help window.
Probability mass and probability density functions
Probability mass functions or pmf (for discrete random variables) and probability density
functions of pdf (for continuous random variables) start with the letter d, e.g., dbeta, dbinom,
etc. Mass distribution functions are referred to by p
X
(k) = P[X=k], and probability density
functions by f
X
(x). Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X=2] = p
X
(2) =
dbinom(2,10,0.5). And, if X ~ Normal(,
2
) with = 1.5, = 0.2, then f
X
(1.75) =
dnorm(1.75,1.5,0.2). The following probability mass and density functions are defined:
dbeta the beta density function
dbinom the binomial probability function
dchisq the chisquare density function
df The F density function [modified by the author, 2/1/2001]
dgamma the gamma density function
dhypgeo the hypergeometric probability function
dnorm the normal density function [modified by the author, 2/1/2001]
dt the student t density function
Cumulative distribution functions
Cumulative distribution functions (cdf) are referred to as distribution functions if dealing with
continuous variable, or as cumulative probability function if dealing with discrete variables.
All cdfs in this package start with a p: pbeta, pbinom, etc. Both, discrete and continuous cdfs
are referred to by F
X
(x) = P[Xx]. Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X2] =
Download at InfoClearinghouse.com 64 2001 Gilberto E. Urroz
F
X
(2) = pbinom(2,10,0.5). And, if X ~ Normal(,
2
) with = 1.5, = 0.2, then F
X
(1.75) =
pnorm(1.75,1.5,0.2). The following cumulative distribution functions are defined:
pbeta the beta distribution function
pbinom the binomial cumulative probability function
pchisq the chisquare distribution function
pf The F distribution function
pgamma the gamma distribution function
phypge the hypergeometric cumulative probability function
pnorm the normal distribution function
pt the student t cdf (modified by the author, 2/1/2001)
Inverse cumulative distribution functions
Inverse cumulative distribution functions start with q: qbeta, qbinom, etc. . If F
X
(q) = P[Xq] =
p, then q = F
X
-1
(p). The value q is also referred to as a quantile of the distribution. The
following inverse cumulative distribution functions are defined:
qbeta the beta inverse distribution function
qbinom the binomial inverse cdf
qchisq the chisquare inverse distribution function
qf The F inverse distribution function
qgamma the gamma inverse distribution function
qhypg the hypergeometric inverse cdf
qnorm the normal inverse distribution function
qt the student t inverse distribution function
quantile empirical quantile (percentile).
Generating synthetic data
The generation of synthetic data that follows a particular distribution can be accomplished
with the following random number generators. The name of the random generator functions
begins with r: rbeta, rbinom, etc. Maple already provides function rand that produces
uniformly distributed random numbers (use help rand for more information). The functions
provided by STIXBOX generates random numbers that follow the distributions suggested by the
names of the functions. Thus, if you want to generate n = 10 data values x that follow the
normal distribution, with = 0.5, and
lnX
= 0.1, use rnorm(10,0.5,0.1).
rbeta random numbers from the beta distribution
rbinom random numbers from the binomial distribution
rchisq random numbers from the chisquare distribution
rexpweib random numbers from the exponential or weibull distributions
rf random numbers from the F distribution
rgamma random numbers from the gamma distribution
rgeom random numbers from the geometric distribution
rhypg random numbers from the hypergeometric distribution
rjbinom random numbers from the binomial distribution (reject method)
rjgamma generates gamma random deviates (reject method)
rjpoiss random numbers from the poisson distribution (reject method)
rnorm normal random numbers
rjpoiss random numbers from the poisson distribution (renewal method)
Download at InfoClearinghouse.com 65 2001 Gilberto E. Urroz
rt random numbers from the student t distribution
Logistic regression
These functions involve the logistic population growth model (see, for example, Example 8.3,
page 504, in Kottegoda, N.T. and R. Rosso, 1997, Probability, Statistics, and Reliability for
Civil and Environmental Engineers, The McGraw-Hill Companies, Inc., New York).
lodds log odds function.
loddsinv compute the inverse of log odds.
logitfit fit a logistic regression model.
Statistical graphics
Functions to produce a variety of statistical graphics. A normal probability paper plot is
obtained by using qqnorm. Probability paper plots are also referred to as Q-Q plots. For that
reason the corresponding function names start with qq, e.g., qqgamma, qqnorm, etc. Also of
interest are functions histo, plotsym.
histo plot a histogram
identify identify points on a plot by clicking with the mouse.
pairs pairwise scatter plots (does not work)
plotdens draw a nonparametric density estimate.
plotsym plot with symbols
qqnorm normal probability paper
qqplot plot empirical quantile vs empirical quantile
Binomial coefficients
bincoef calculates binomial coefficients: (
n
r
) = n!/(r!(n-r)!),
Resampling methods
These methods apply to the process of resampling by which an attempt is made to remove any
existing bias in the sample. For a quick introduction to jackknife (named so because the
jackknife, like this method, is an useful tool) and the bootstrap (named so from the expression
"lifting oneself by one's bootstraps"), see pp. 116-117 in Kottegoda, N.T. and R. Rosso, 1997,
Probability, Statistics, and Reliability for Civil and Environmental Engineers, The McGraw-Hill
Companies, Inc., New York.
covboot bootstrap estimate of the variance of a parameter estimate.
covjack Jackknife estimate of the variance of a parameter estimate.
stdboot bootstrap estimate of the parameter standard deviation.
stdjack Jackknife estimate of the standard deviation of a parameter.
rboot simulate a bootstrap resample from a sample.
ciboot various bootstrap confidence interval.
test1b bootstrap t test and confidence interval for the mean.
Download at InfoClearinghouse.com 66 2001 Gilberto E. Urroz
Tests, confidence intervals, and model estimation
These are functions related to statistical inference. Of interest for this class are the functions
lsfit, testln, and test2r. Use the help function to obtain additional information on the
functions.
cmpmod compare small linear model versus large one
ciquant nonparametric confidence interval for quantile
kstwo Kolmogorov-Smirnov statistic from two samples (needs function pks)
linreg linear or polynomial regression
lsfit fit a multiple regression model.
lsselect select a predictor subset for regression
test1n tests and confidence intervals based on a normal sample
test1r test for median equals 0 using rank test
test2n tests and confidence intervals based on two normal samples
test2r test for equal location of two samples using rank test
Stixbox demonstrations
These are SCILAB functions that demonstrate some of the functions contained in STIXBOX
stixdemo demonstrate various stixbox routines.
stixtest a second demo for stixbox
Famous datasets
Function getdata is used to load well-known datasets into the SCILAB environment. The data
sets included are:
1 Phosphorus Data
2 Scottish Hill Race Data
3 Salary Survey Data
4 Health Club Data
5 Brain and Body Weight Data
6 Cement Data
7 Colon Cancer Data
8 Growth Data
9 Consumption Function
10 Cost-of-Living Data
11 Demographic Data
To activate function getdata and load data into variable x use:
--> x = getdata()
This function produces a dialog box displaying the list of data sets. The user can type in the
number of the data set and get back some information about the data set before the set is
loaded. The dialog box produced by getdata() is shown below.
Download at InfoClearinghouse.com 67 2001 Gilberto E. Urroz
The dialog box shows that we have selected data set number 5. Pressing [OK] will load the
data as well as provide information as shown below.
Examples on probability distributions using STIXBOX
!Plot of the standard normal distribution:
-->z=-4:0.1:4;phi=dnorm(z,0,1);plot(z,phi,'z','phi(z)','standard normal')
Download at InfoClearinghouse.com 68 2001 Gilberto E. Urroz
!Plot of the Student-t distribution for = 2, 5, 10, 15, 20
-->t=-4.0:0.1:4;nu=[2,5,10,15,20];
-->for k=1:5,f=dt(t,nu(k));plot2d(t,f,k,'011',' ',[-4 0 4 0.4]), end
-->xtitle('Student t distribution','t','f(t)')
!Plot of the chi-square distribution for nu=5
-->x=0:0.1:20;nu=5;f=dchisq(x,nu);
-->plot(x,f,'x','f(x)','Chi-square distribution, nu=5')
!Plot the F distribution for nu1=5 and nu2=10:
-->x=0:0.1:5;nu1=5;nu2=10;f=df(x,nu1,nu2);
-->plot(x,f,'F','f(F)','F distribution, nu1=5, nu=10')
Download at InfoClearinghouse.com 69 2001 Gilberto E. Urroz
!Determining z
) > , or P(Z<z
) > 1- . Also, z
/2 is such that P(Z>z
/2
) >
/2, or P(Z<z
/2
) > 1- /2:
-->alpha = 0.05; z_alpha=qnorm(1-alpha), z_alpha2=qnorm(1-alpha/2)
z_alpha = 1.6448536
z_alpha2 = 1.959964
!Determining t
,,
such that P(T>t
) > , or P(T<t
) > 1- . Also t
,/2 is such that P(T>t
/2
) >
/2, or P(T<t
/2
) > 1- /2:
-->nu=10;alpha=0.01;t_alpha=qt(1-alpha,nu),t_alpha2=qt(1-alpha/2,nu)
t_alpha = 2.7637695
t_alpha2 = 3.1692727
!Determining
2
,
, such that P(X
2
>
2
) > , or P(X
2
>
2