Probability Distributions Wiht SCILAB

Probability Distributions with SCILAB
By
Gilberto E. Urroz, Ph.D., P.E.
Distributed by
i nfoClearinghouse.com
2001 Gilberto E. Urroz
All Rights Reserved

A "zip" file containing all of the programs in this document (and other
SCILAB documents at InfoClearinghouse.com) can be downloaded at the
following site:

http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scil
ab_Docs/ScilabBookFunctions.zip

The author's SCILAB web page can be accessed at:

http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html

Please report any errors in this document to: gurro@cc.usu.edu

Download at InfoClearinghouse.com 1 2001 Gilberto E. Urroz
PROBABILITY DISTRIBUTIONS 3
Discrete probability distributions 3
Bernoulli probability distribution 3
Binomial probability distribution 4
Poisson probability distribution: 5
Geometric probability distribution: 6
Hypergeometric probability mass function 7
Cumulative distribution functions for discrete probability distributions 9
SCILAB functions for discrete cumulative distribution functions 9
SCILAB function cdfbin 9
Discrete probability calculations through user-defined functions 10
Combinations 11
Binomial distribution 11
Poisson distribution 12
Geometric distribution 13
Hypergeometric distribution 14
Continuous probability functions 15
Factorials and the Gamma function 15
The gamma distribution 16
The exponential distribution 17
The beta distribution 17
The Weibull distribution 19
The uniform distribution 19
User-defined functions for continuous probability distributions 20
Continuous probability distributions used in statistical inference 25
The Normal distribution 25
The Student-t distribution 25
The Chi-squared (
2
) distribution 27
The F distribution 28
Applications of the normal distribution in data analysis 30
Plotting a histogram and its corresponding normal curve 31
Plotting data against their normal scores 34
The lognormal distribution 36
Generating synthetic data 38
Generating normally-distributed synthetic data 38
Additional applications of function rand 39
SCILAB function for generating synthetic data 40
Examples of synthetic data generation using function grand 41
Additional notes on function grand 49
Pseudo-random generators 50
Generating log-normally-distributed data 51
Generating data that follows the Weibull distribution 52
Generating data that follows the Students t distribution 53
Generating data that follows a discrete distribution 54
Statistical simulation 56
Simulating traffic through a service station 57
An user-defined function to simulate traffic through a service station 58
Modeling traffic through a service station with random input 60
STIXBOX: a rudimentary statistics toolbox 63
Exercises 72
Probability Distributions
There are a number of mathematical functions that possess the properties of a probability mass
function for discrete random variables or the properties of a probability density function for
continuous random variables. In this section we introduce a number of those functions for the
calculation of probabilities. Because these probability distributions depend on a finite number
of parameters they are typically referred to as parametric distributions.
Discrete probability distributions
Some of the most useful discrete probability distributions are the Bernoulli, Binomial, Poisson,
geometric, and hypergeometric distributions. The definitions of the corresponding probability
mass and distribution functions are shown below. We also present expressions for the mean,
variance, and standard deviation of these distributions.
Bernoulli probability distribution
The Bernoulli probability distribution applies to a discrete random variable that can only have
values of 0 or 1, i.e., X = 0, 1. Let the probability of X = 1 be p, i.e., f
X
(1) = p, then f
X
(0) = 1-p.
This can be summarized as
f
X
(x) = p
x
(1-p)
1-x
, x = 0,1
The mean value of the distribution is
X
= 0 (1-p) + 1 p = p.
The expectation of X
2
, E(X
2
), is needed to calculate the variance Var(X) = E(X
2
)-
X
2
. For the
Bernoulli distribution,
E(X
2
) = 0
2
(1-p) + 1
2
p = p,
and
Var(X) = E(X
2
)-
X
2
= p-p
2
= p(1-p).
Thus, the standard deviation is
X
= [p(1-p)]
1/2
.
These results can be obtained using SCILAB as follows:
-->p=poly(0,'p')
p =
p
-->X = [0,1]
X =
! 0. 1. !
-->Prob = [1-p p]
Prob =
! 1 - p p !
-->muX = X*Prob'
muX = p
-->EX2 = X^2*Prob'
EX2 = p
-->VarX = EX2 - muX^2
VarX =
2
p - p
The Bernoulli distribution applies to a simple binary experiments in which only two possible
outcomes exist: 1 or 0, yes or no, success or failure. The value of the probability of success,
p, can be obtained, for example, from the classical or from the frequency definitions of
probability. Bernoulli processes constitute the base of the binomial and geometric
distributions presented below.
Binomial probability distribution
If a Bernoulli experiment with success probability p is repeated n times, the probability of
having x successes out of the n trials is given by
1 0 , ,..., 2 , 1 , 0 , ) 1 (
) 1 ( ) 1 (
) 1 (
) 1 ( ) ( < <
+ +
+

,
_

p n x p p
r n r
n
p p
x
n
x f
x n x x n x
X
with
X
= np, Var(X) = np(1-p), and
x
= [np(1-p)]
1/2
.
In SCILAB, we can define the probability mass function for the Binomial distribution as
-->deff('[f]=fX(x,n,p)',
-->'f=gamma(n+1).*p.^x.*(1-p).^(n-x)./(gamma(x+1).*gamma(n-x+1))')
Next, we use this function to produce a plot of the probability mass function for n = 10, p =
0.10:
-->n=10; p=0.10; xx=[0:1:10]; yy = fX(xx);
-->xset('window',1);xset('mark',-9,2); plot2d(xx',yy',-9)
-->xtitle('Binomial pmf','x','fX(x)')
The following commands produce a plot of the cumulative distribution function:
-->yyy = [];for j = 1:n+1, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)
-->xtitle('Binomial cdf','x','FX(x)')
Poisson probability distribution:
If X is a Binomial variable with n and p 0, we calculate the parameter = np, and define
the Poisson probability mass function as
. 0 ; ,..., 2 , 1 , 0 ,
!
) ( >
x
x
e
x f
x
X
The Poisson pmf can be used to model the number of occurrences of a certain event in a given
time period or per unit length, area or volume, if represents the mean occurrence of the
even per unit time, length, area or volume, respectively.
The Poisson distribution has the parameters
X
= , Var(X) =
2
, and
x
= .
In SCILAB we can define the Poisson distribution pmf as:
-->deff('[p]=fX(x,lambda)','p=exp(-lambda).*lambda.^x./gamma(x+1)')
A plot of the pmf for = 2.5 for values of x between 0 and 20:
-->lambda = 2.5; xx = [0:1:20]; yy =fX(xx,lambda);
-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)
-->xset('Poisson pmf','x','fX(x)')
A plot of the corresponding cumulative distribution function follows:
-->yyy = []; for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',9)
-->xtitle('Poisson cdf','x','FX(x)')
Geometric probability distribution:
Suppose that we have a Bernoulli experiment with probability of success p being repeated until
a successful outcome occurs. Let X represent the number of repetitions before a success, then
X can be modeled with the geometric pmf:
f
X
(x) = p(1-p)
x-1
, x = 1, 2, ,; 0<p<1.
The Poisson distribution has the parameters
X
= 1/p, Var(X) = (1-p)/p
2
, and
x
= (1-p)
1/2
/p.
The pmf for the geometric distribution and a plot of it is obtained in SCILAB by using:
-->deff('[f]=fX(p,x)','f=p*(1-p)^(x-1)')
-->p = 0.25; xx = [0:1:20]; yy = fX(p,xx);
-->xtitle('geometric pmf','x','fX(x)')
A plot of the geometric distribution CDF is shown next:
-->yyy = [];for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;
-->xtitle('geometric cdf','x','FX(x)')
Hypergeometric probability mass function
Suppose that we have a finite population of N elements, out of which a < N elements are
defective. Suppose also that we take a sample of size n < N out of the population, and let X
represent the number of defective elements in the sample of size n. The probability of X is
given by the following pmf:
. ,..., 1 , 0 , 0 , 0 , ) , , , ( n x N a N n
n
N
x n
a N
x
a
N a n x f
X
< < < <
,
_
,
_
,
_

Parameters of the distribution are:
X
= na/N, Var(X) = na(N-a)(N-n)/(N
2
(N-1)).
To produce plots of the hypergeometric probability mass function and cumulative distribution
function, we first define a function accounting for the binomial coefficient:
-->deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))')
This function is incorporated in the definition of the hypergeometric function:
-->deff('[p]=fX(x)','p=C(a,x).*C(N-a,n-x)./C(N,n)')
Next, we produce plots of the hypergeometric pmf and CDF for N = 100, a = 25, and n = 20:
-->N=100;a=25;n=20;
-->xx=[0:1:20];yy=fX(xx);
-->xset('window',1);xset('mark',-9,2);
-->plot2d(xx',yy',-9);xtitle('Hypergeometric distribution','x','fX(x)');
-->yyy=[];for j=1:21, yyy=[yyy sum(yy(1:j))]; end;
-->xset('window',2);xset('mark',-9,2);
-->plot2d(xx',yyy',-9);xtitle('Hypergeometric distribution','x','FX(x)');
-->plot2d(xx',yyy',9)
Cumulative distribution functions for discrete
probability distributions
Out of the five probability distributions presented above, namely, Bernoulli, Binomial, Poisson,
geometric, and hypergeometric, three of them represent finite populations of discrete values
(Bernoulli, Binomial, hypergeometric) and two representing infinite populations (Poisson and
geometric). For the Binomial, Poisson, geometric, and hypergeometric functions, the
cumulative distribution function is calculated using
, ) ( ) (
0
x
k
X X
k f x F
where f
X
(x) represents the corresponding probability mass functions. (This is the definition
used to produce the CDF graphics shown in the previous examples). The cumulative
distribution function F
X
(x) is defined in the same range of values of the discrete random
variable X.
For the geometric distribution, whose domain starts at x = 1, the corresponding expression is
,... 3 , 2 , 1 , ) 1 ( ) ( ) (
1
1
1

x p p k f x F
x
k
k
x
k
X X
SCILAB functions for discrete cumulative distribution functions
SCILAB provides a number of functions for operations with cumulative distribution functions.
For discrete distributions the following functions are provided:
cdfbin - Binomial distribution
cdfnbn - Negative binomial distribution
cdfpoi - Poisson distribution (described in detail in Chapter )
Information on these functions can be obtained by using the help function. Next, we describe
the use of function cdfbin.
SCILAB function cdfbin
There four different forms of the call to function cdfbin:
[P,Q]=cdfbin("PQ",S,Xn,Pr,Ompr)
[S]=cdfbin("S",Xn,Pr,Ompr,P,Q)
[Xn]=cdfbin("Xn",Pr,Ompr,P,Q,S)
[Pr,Ompr]=cdfbin("PrOmpr",P,Q,S,Xn)
The variable Pr in these calls represents the probability of success on any given trial that we
refer to as p in the definition of the Bernoulli pmf shown earlier. On the other hand, OmPr
represents 1-Pr (in some references this is referred to as q = 1 - p), i.e., the probability of
failure in a given trial. The variable P represents the probability P(XS), where X ~
Binomial(Xn,Pr), while Q = 1 - P.
The first argument in the calls to function cdfbin is a string that determines which variable is
being sought, according to:
PQ -calculate probabilities, P = P(XS) and Q = 1 - P
S -calculate the inverse CDF, i.e., calculate S from P = P(XS)
Xn -calculate the number of trials (n in the definition of the pdf)
PrOmpr - calculate the probability of success in any given trial (p in the pdf definition)
Care should be exercised in keeping the proper order of the variables in the calls to the
function.
Some examples follow:
-->n = 10; x = 6; p = 0.35; q = 1-p;
-->[P,Q] = cdfbin('PQ',x,n,p,q) //Calculating probabilities
Q =
.0260243
P =
.9739757
-->n=20;p=0.35;q=1-p;P=0.75;Q=1-P;
-->x = cdfbin("S",n,p,q,P,Q) //Calculating the inverse CDF
x =
7.9132062
-->[p,q] = cdfbin("PrOmpr",P,Q,x,n) //Calculating p and q = 1-p
q =
.7391494
p =
.2608506
Notes: Use help cdfnbn to learn more about the function that implements the negative
Binomial distribution. The function cdfpoi was described in detail in Chapter 13.
Discrete probability calculations through user-defined functions
Besides the few pre-programmed cumulative distribution functions provided by SCILAB,
probabilities can be calculated by defining probability mass and cumulative distribution
functions for the different distributions presented earlier. The basic definitions of
probabilities in terms of probability mass and cumulative distribution functions are:
P(X=x) = f
X
(x), pmf

x
x
X
k f x X P
0
), ( ) ( cdf for Binomial, Poisson, and hypergeometric distributions

x
x
X
k f x X P
1
), ( ) ( cdf for geometric distribution
We will define the following functions for the distributions shown earlier:
pmf CDF
Binomial b(x,n,p) B(x,n,p)
Poisson p(x,lambda) P(x,lambda)
geometric g(x,p) G(x,p)
hypergeometric h(x,N,n,a) H(x,N,n,a)
The following is a SCILAB script, called DiscreteProbabilityFunctions, which includes the
definitions for the eight function calls listed in the table immediately above:
//Defining discrete probability distributions
deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))') //Binomial coefficient
deff('[bb]=b(x,n,p)','bb=C(n,x).*p.^x.*(1-p).^(n-x)') //Binomial pmf
deff('[BB]=B(x,n,p)','BB=sum(b([0:1:x],n,p))') //Binomial CDF
deff('[pp]=p(x,lambda)','pp=exp(-lambda).*lambda^x./gamma(x+1)') //Poisson pmf
deff('[PP]=P(x,lambda)','PP=sum(p([0:1:x],lambda))') //Poisson CDF
deff('[gg]=g(x,p)','gg=p.*(1-p).^(x-1)') //Geometric pmf
deff('[GG]=G(x,p)','GG=sum(g([1:x],p))') //Geometric CDF
deff('[hh]=h(x,N,n,a)','hh=C(a,x).*C(N-a,n-x)./C(N,n)') //Hypergeometric pmf
deff('[HH]=H(x,N,n,a)','HH=sum(h([0:1:x],N,n,a))') //Hypergeometric CDF
To execute the script that defines the discrete probability functions use:
-->exec('DiscreteProbabilityFunctions')
Combinations
The function C(n,r) represents combinations of n elements taken r by r, or the binomial
coefficient:
-->C(10,5)
ans =
252.
This is a vector of values of C(n,r) for n = 10, and r = 0,1, , 10:
-->C10=[];for j=0:10,C10=[C10 C(10,j)]; end; C10
C10 =
! 1. 10. 45. 120. 210. 252. 210. 120. 45. 10.
1. !
Binomial distribution
For the binomial distribution with n = 10 and p = 0.25, the following call to function b(x,n,p)
calculates the probability P(X=2) = b(2,10,0.25):
-->b(2,10,0.25)
ans =
.2815676
The following is a list of values of the binomial pmf for n = 10, p = 0.25, for all possible values
of x = 0,1, , 10:
-->b10=[];for j=0:10,b10=[b10 b(j,10,0.25)]; end; b10
b10 =
column 1 to 7
! .0563135 .1877117 .2815676 .2502823 .145998 .0583992
.016222 !
column 8 to 11
! .0030899 .0003862 .0000286 9.537E-07 !
The binomial CDF for x = 2, n = 10, p = 0.25 is calculated with the following call to function
B(x,n,p). This value represents P(X2):
-->B(2,10,0.25)
ans =
.5255928
This value represents P(X>2) = 1 - P(X2):
-->1-B(2,10,0.25)
ans =
.4744072
The following is a list of values of the binomial CDF for n = 10, p = 0.25, for all values of x =
0,1, , 10:
-->B10=[];for j=0:10,B10=[B10 B(j,10,0.25)]; end; B10
B10 =
column 1 to 7
! .0563135 .2440252 .5255928 .7758751 .9218731 .9802723
.9964943 !
column 8 to 11
! .9995842 .9999704 .9999990 1. !
Poisson distribution
The pmf of the Poisson distribution can be used to calculate probabilities such as P(X=2) for =
5.2:
-->p(2,5.2)
ans =
.0745840
For P(X=6), the Poisson distribution with for = 5.2 produces:
-->p(6,5.2)
ans =
.1514803
The cumulative distribution function for the Poisson distribution, with for = 5.2, provides the
probability P(X6):
-->P(6,5.2)
ans =
.7323933
The following SCILAB commands produce a vector of values of the Poisson cdf for x = 0, 1, ,
10, and = 5.2:
-->P10=[];for j=1:10, P10=[P10 P(j,5.2)]; end; P10
P10 =
column 1 to 7
! .0342027 .1087867 .2380655 .406128 .580913 .7323933
.8449216 !
column 8 to 10
! .9180650 .9603256 .9823011 !
Geometric distribution
The probabilities P(X=3) and P(X=5) using the geometric distribution with p = 0.50 are
calculated as:
-->g(3,0.50)
ans =
.125
-->g(5,0.50)
ans =
.03125
The following example shows a way to calculate a vector of values of the geometric
distribution pmf for x = 1, 2, , 10:
-->g([1:10],0.5)
ans =
column 1 to 9
! .5 .25 .125 .0625 .03125 .015625 .0078125
.0039063 .0019531 !
column 10
! .0009766 !
The following evaluations of the geometric distribution cdf are used to calculate the
probabilities P(X6), P(X3), and P(X1), respectively:
-->G(6,0.5)
ans =
.984375
-->G(3,0.5)
ans =
.875
-->G(1,0.5)
ans =
.5
A vector of values of the geometric distribution CDF, with p = 0.5, is produced by using the
following commands:
-->G10=[];for j=1:10, G10=[G10 G(j,0.5)]; end; G10
G10 =
column 1 to 9
! .5 .75 .875 .9375 .96875 .984375 .9921875
.9960938 .9980469 !
column 10
! .9990234 !
Hypergeometric distribution
The next line assign values to the parameters N, n, and a in the hypergeometric distribution:
-->N=100;n=20;a=35;
The probability P(X=12) for the hyperbolic distribution with the parameters N, n, and a defined
above is calculated as:
-->h(12,N,n,a)
ans =
.0078581
The cumulative distribution function for the hypergeometric distribution for x = 12 is
calculated as follows:
-->H(12,N,n,a)
ans =
.9976693
The value just calculated represents the probability P(X12). The next statement generates a
vector of values of the hypergeometric pdf for x = 0, 1, 2, , 20:
-->h([0:20],N,n,a)
ans =
column 1 to 7
! .0000529 .0008046 .0055295 .0228093 .0633073 .1256018
.1847085 !
column 8 to 14
! .2060210 .1768671 .1179114 .0613139 .0248839 .0078581
.0019176 !
column 15 to 21
! .0003575 .0000501 .0000051 3.698E-07 1.761E-08 4.924E-10
6.060E-12 !
The next line produces a vector of values of the hypergeometric CDF:
-->H10=[];for j=1:10,H10=[H10 h(j,N,n,a)]; end; H10
H10 =
column 1 to 7
! .0008046 .0055295 .0228093 .0633073 .1256018 .1847085
.2060210 !
column 8 to 10
! .1768671 .1179114 .0613139 !
Continuous probability functions
In this section we describe several continuous probability distributions including the gamma,
exponential, beta, and Weibull distributions. Some of these distributions make use of the
Gamma function, (x), which is defined next.
__________________________________________________________________________________
Factorials and the Gamma function (see also Chapter 13)
The Gamma function is defined by
This function has the property that ,
() = (-1) (1), for > 1,
therefore, it can be related to the factorial of a number, i.e.,
dx e x
x

0
1
) (

() = (-1)!,
when is a positive integer.
Factorials have applications in combinatorics (calculation of combinations and permutations,
etc.), and in some discrete probability distributions (e.g., binomial probability distribution),
while the gamma function has applications in continuous probability distributions (e.g., the
gamma probability distribution.)
__________________________________________________________________________________
The gamma distribution
The probability distribution function (pdf) for the gamma distribution is given by
The parameters and are referred to, respectively, as the shape and scale parameters of the
gamma distribution. Other parameters of this distribution are:
2
,
X X
.
SCILAB provides function cdfgam for operations with the gamma distribution CDF. The calls to
this function take the form
[P,Q]=cdfgam("PQ",X,Shape,Scale)
[X]=cdfgam("X",Shape,Scale,P,Q)
[Shape]=cdfgam("Shape",Scale,P,Q,X)
[Scale]=cdfgam("Scale",P,Q,X,Shape)
where P = P(XX<X), Q = 1- P, Shape = , and Scale = , with XX ~ gamma(,).
The following are examples of applications of function cdfgam. The following three calls
determine, respectively, the probabilities P = P(X<10), P = P(X<3), and P = P(X<0.5), as well as
the probabilities of the complement, Q = 1 - P, for the gamma distribution with = 2, = 3:
-->[P,Q]=cdfgam("PQ",10,2,3)
Q =
2.901E-12
P =
1.
-->[P,Q]=cdfgam("PQ",3,2,3)
Q =
.0012341
P =
.9987659
; 0 , 0 , 0 ), exp(
) (
1
) (
1
> > >
x for
x
x x f
-->[P,Q]=cdfgam("PQ",0.5,2,3)
Q =
.5578254
P =
.4421746
The next call to function cdfgam calculates the inverse gamma function, i.e., the value of x for
P = P(X<x) where X follows the gamma distribution with = 2, = 3:
-->x=cdfgam('X',2,3,0.4,0.6)
x =
.4588071
The next call to the function is used to calculate the shape parameter, , given a probability P
= P(X<0.3) = 0.6, Q = 1-P = 0.4, with X following the gamma distribution with a scale parameter
= 2:
-->alpha = cdfgam('Shape',2,0.6,0.4,0.3)
alpha =
.7190660
The next call to function cdfgam calculates the scale parameter, , given a probability P =
(X<1.2) = 0.2, Q = 1-P = 0.8, with X following the gamma distribution with = 3:
-->beta = cdfgam('Scale',0.2,0.8,1.2,3)
beta =
1.2792035
The exponential distribution
The exponential distribution is the gamma distribution with = 1. Its pdf is given by
While its cdf is given by
F
X
(x) = 1 - exp(-x/), for x>0, >0.
Parameters of the exponential distribution include:
.
1
,
1

X X
The beta distribution
; 0 , 0 ), exp(
1
) ( > >

x
x
x f
X
The pdf for the beta distribution is given by
As in the case of the gamma distribution, the corresponding cdf for the beta distribution is
also given by an integral with no closed-form solution.
The parameters of the beta distribution include
. ) )( 1 (
) ( ,
2

+ + +
+
X Var
X
SCILAB provides function cdfbet for operations with the cumulative distribution function of the
beta distribution. Calls to the function are the following:
[P,Q]=cdfbet("PQ",X,Y,A,B)
[X,Y]=cdfbet("XY",A,B,P,Q)
[A]=cdfbet("A",B,P,Q,X,Y)
[B]=cdfbet("B",P,Q,X,Y,A)
In these calls P = P(XX<X), Y = 1 - X, Q = 1 - P, A, B are the parameters and of the beta
distribution.
Next, we present some applications of function cdfbet. The first example calculate the
probability P(X<0.35) for the beta distribution with = 2, = 3:
-->[P,Q]=cdfbet('PQ',0.35,1-0.35,2,3)
Q =
.5629813
P =
.4370187
An example that calculates the inverse function of the beta cdf, i.e., the value of x for which P
= P(X<x) = 0.75, for the beta distribution with = 3, = 5 is presented next:
-->[X,Y] = cdfbet("XY",3,5,0.75,1-0.75)
Y =
.5139030
X =
.4860970
The next two examples shows how to obtain the parameters a and b from the beta distribution
given values of X = 0.3, Y = 1-X = 0.7, P = P(X<0.3) = 0.4, and Q = 1-P = 0.6. In the first
application = 3.5, while in the second application = 1.5:
-->alpha = cdfbet("A",3.5,0.4,0.6,0.3,0.7)
alpha =
; 0 , 0 , 1 0 , ) 1 (
) ( ) (
) (
) (
1 1
> > < <

+

x x x x f
X
2.0459494
-->beta = cdfbet("B",0.6,0.4,0.8,0.2,1.5)
beta =
.7453948
The Weibull distribution
The pdf for the Weibull distribution is given by
While the corresponding cdf is given by
Parameters of this distribution are:
1
]
1
+ + +

)
1
1 ( )
2
1 ( ) ( ),
1
1 (
2 / 2 / 1

X Var
X
.
The uniform distribution
The uniform distribution for a continuous random variable is defined for values of X such that a
<x<b. The corresponding probability density function is given by
b x a
a b
x f
X
< <
,
1
) (
The cumulative distribution function is
b x a
a b
a x
x F
X
< <
, ) (
The parameters of the uniform distribution are:
.
12
) (
) ( ,
2
2
a b
X Var
b a
X
+

The following function definition implements the cumulative distribution function for the
uniform distribution in SCILAB:
-->deff('[FF]=FX(x)','FF=(x-a)/(b-a)')
For values of a = 2.5 and b = 3.2, we proceed to calculate some probabilities:
0 , 0 , 0 ), exp( ) (
1
> > >

x for x x x f
0 , 0 , 0 ), exp( 1 ) ( > > >
x for x x F
--> a = 2.5; b = 3.2;
First, we calculate P(X<2.7) = F
X
(2.7):
-->FX(2.7)
ans =
.2857143
Next, we calculate P(X>3) = 1 - P(X<3) = 1 - F
X
(3):
-->1-FX(3)
ans =
.2857143
The following example calculates P(2.8<X<3) = P(X<3) - P(X<2.8) = F
X
(3) - F
X
(2.8):
-->FX(3)-FX(2.8)
ans =
.2857143
User-defined functions for continuous probability distributions
The following SCILAB script defines the probability density function and the cumulative density
function for four selected continuous distributions: gamma, exponential, beta, and Weibull.
The script is called ContinuousProbabilityFunctions, and is invoked by using:
-->exec('ContinuousProbabilityFunctions')
The listing of the script is the following:
//Define selected continuous probability functions
deff('[gg]=gam(x,a,b)','gg=x.^(a-1).*exp(-x./b)./(b.^a.*gamma(a))')
deff('[GG]=GAM(x,a,b)','GG=intg(0,x,gam)')
deff('[ee]=eex(x,b)','ee=exp(-x./b)./b')
deff('[EE]=EEX(x,b)','EE=1-exp(-x./b)')
deff('[bb]=bet(x,a,b)',...
'bb=gamma(a+b).*x.^(a-1).*(1-x).^b./(gamma(a).*gamma(b))')
deff('[BB]=BET(x,a,b)','BB=intg(0,x,bet)')
deff('[ww]=w(x,a,b)','ww=a.*b.*x^(b-1).*exp(-a.*x.^b)')
deff('[WW]=W(x,a,b)','WW=1-exp(-a.*x.^b)')
The functions defined through the script are summarized in the following table:
pdf CDF
gamma gam(x,,) GAM(x,,)
exponential eex(x,) EEX(x,)
beta bet(x,,) BET(x,,)
Weibull w(x,,) W(x,,)
Applications of these functions follow, starting with the gamma distribution.
The gamma distribution
First, we plot the pdf of the distribution using = 2 and = 3:
-->xx=(0:0.1:20);yy=gam(xx,2,3);
-->plot(xx,yy,'x','fX(x)','gamma distribution')
A plot of the gamma distribution CDF for = 2 and = 3 is obtained by using:
-->yyy=[];for x=0:0.1:20, yyy=[yyy GAM(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','gamma distribution')
The CDF can be used to calculate probabilities. The next three lines calculate the following
probabilities P(X<5) = F
X
(5), P(6<X<11) = F
X
(11) - F
X
(5), and P(X>7.5) = 1 - P(X<7.5) = 1 - F
X
(7.5):
-->GAM(5,2,3)
ans = .4963317
-->GAM(11,2,3)-GAM(6,2,3)
ans = .2867187
-->1-GAM(7.5,2,3)
ans = .2872975
The exponential distribution
The following commands generate plots of the pdf and CDF for the exponential distribution
using = 2.5:
-->xx=(0:0.1:20);yy=eex(xx,2.5);
-->plot(xx,yy,'x','fX(x)','exponential distribution')
-->yyy=[];for x=0:0.1:20, yyy=[yyy EEX(x,2.5)]; end;
-->plot(xx,yyy,'x','FX(x)','exponential distribution')
The following probability calculations for the exponential distribution with = 2.5 are
presented next: P(X<6) = F
X
(6), P(X>4) = 1 - P(X<4) = 1 - F
X
(4), and P(4<X<6) = F
X
(6)-F
X
(4):
-->EEX(6,2.5)
ans =
.9092820
-->1-EEX(4,2.5)
ans =
.2018965
-->EEX(6,2.5)-EEX(4,2.5)
ans =
.1111786
The beta distribution
To plot the pdf and CDF of the beta distribution with = 2.5, = 3.5, we use:
-->xx=(0:0.05:1);yy=bet(xx,2.5,3.5);
-->plot(xx,yy,'x','fX(x)','beta distribution')
-->yyy=[];for x=0:0.05:1, yyy=[yyy BET(x,2.5,3.5)]; end;
-->plot(xx,yyy,'x','FX(x)','beta distribution')
The following probability calculations for the beta distribution with = 2.5 are presented next:
P(X<0.25) = F
X
(0.25), P(X>0.75) = 1 - P(X<0.75) = 1 - F
X
(4), and P(0.3<X<0.8) = F
X
(0.8)-F
X
(0.3):
-->BET(0.25,2.5,3.5)
ans =
.1737696
-->1-BET(0.75,2.5,3.5)
ans =
.4250376
-->BET(0.8,2.5,3.5)-BET(0.3,2.5,3.5)
ans =
.3428804
The Weibull distribution
Plots of the pdf and CDF for the Weibull distribution with = 2 and = 3 are obtained as
follows:
-->xx=(0:0.01:2);yy=w(xx,2,3);
-->plot(xx,yy,'x','fX(x)','Weibull distribution')
-->yyy=[];for x=0:0.01:2, yyy=[yyy W(x,2,3)]; end;
-->plot(xx,yyy,'x','FX(x)','Weibull distribution')
The following probability calculations for the Weibull distribution with = 2 and = 3 are
presented next: P(X<1.5) = F
X
(1.5), P(X>0.6) = 1 - P(X<0.6) = 1 - F
X
(4), and P(0.5<X<1.2) =
F
X
(0.8)-F
X
(0.3):
-->W(1.5,2,3)
ans =
.9988291
-->1-W(0.6,2,3)
ans =
.6492094
-->W(1.2,2,3)-W(0.5,2,3)
ans =
.7472451
Continuous probability distributions used in statistical inference
Statistical inference is the process by which sample data is used to provide information about
the population. Some of the products of statistical inference are the generation of confidence
intervals and the test of hypotheses for population parameters. There are a number of
continuous probability distributions of great utility in statistical inference. These are:
the standard normal distribution
the Students t distribution
the Chi-square (
2
) distribution
the F distribution
The probability density functions (pdf) for these distributions are presented below:
The Normal distribution
The expression for the normal distribution pdf is:
where is the mean, and
2
the variance of the distribution.
SCILAB provides function cdfnor for operations with the cumulative distribution function for the
normal distribution. Function cdfnor was presented in detail in Chapter . To find on-line
information on this function use the command:
-->help cdfnor
The Student-t distribution
The Student-t, or simply, the t-, distribution has one parameter , known as the degrees of
freedom. The probability density function (pdf) is given by
The following SCILAB commands can be used to plot the pdf for the Student t distribution with
-->deff('[f]=fT(t,nu)',...
-->'f=gamma((nu+1)./2).*(1+t.^2./nu).^(-(nu+1)/2)/(sqrt(%pi*nu)*gamma(nu/2))')
-->tt=[-4:0.1:4];ff=fT(tt,6);
-->plot(tt,ff,'t','fT(t)','Student t - nu = 6')
< < +

+
t
t
t f , ) 1 (
)
2
(
)
2
1
(
) (
2
1 2
],
2
) (
exp[
2
1
) (
2
2

x
x f
SCILAB provides function cdft for operations with the cumulative distribution function of the
Students t distribution. The calls to the function are as follows:
[P,Q]=cdft("PQ",T,Df)
[T]=cdft("T",Df,P,Q)
[Df]=cdft("Df",P,Q,T)
In these function calls, P = P(TT<T), Q = 1 - P, Df = degrees of freedom = , with TT ~ Student
t(Df).
-->[P,Q] = cdft("PQ",0.4,6) //Probability calculation
Q =
.3515041
P =
.6484959
-->t = cdft("T",8,0.45,1-0.45) //Inverse CDF calculation
t =
- .1297073
-->nu = cdft("Df",0.7,0.3,0.8) //Obtaining degrees of freedom
nu =
.7716700
A plot of the CDF for the Student t distribution can be produced using the following commands:
-->xx=[-4:0.1:4];
-->yy=[];for x=-4:0.1:4, yy=[yy cdft('PQ',x,6)]; end;
-->plot(xx,yy,'t','fX(t)','Student t - nu = 6')
The Chi-squared (
2
) distribution
The Chi-squared (
2
) distribution has one parameter , known as the degrees of freedom. The
probability distribution function (pdf) is given by
A plot of the pdf for the Chi-square distribution with = 10 can be obtained by using:
-->xx = [0:0.1:10];
-->yy=[];for x=0:0.1:10, yy=[yy cdfchi('PQ',x,4)]; end;
-->plot(xx,yy,'t','fX(t)','Chi-square - nu = 4')
SCILAB provides function cdfchi for operations with the cumulative distribution function of the
2

(chi-square) distribution. The calls to this function include:
[P,Q]=cdfchi("PQ",X,Df)
[X]=cdfchi("X",Df,P,Q);
[Df]=cdfchi("Df",P,Q,X)
0 , 0 ,
)
2
( 2
1
) (
2
1
2
2
> >

x e x x f
x

In these calls to function cdfchi P = P(XX<X), Q = 1 - P, Df = degrees of freedom, with XX ~
2
(Df).
-->[P,Q] = cdfchi("PQ",1,10) //Probability calculation
Q =
.9998279
P =
.0001721
-->[P,Q] = cdfchi("PQ",0.2,10) //Probability calculation
Q =
.9999999
P =
7.668E-08
-->chi2 = cdfchi("X",4,0.4,0.6) //Inverse CDF calculation
chi2 =
2.7528427
-->nu = cdfchi("Df",0.4,0.6,2.7) //Calculating degrees of freedom
nu =
3.9409085
A plot of the CDF for the Chi-square distribution with n = is obtained by using:
-->deff('[f]=fC(x,nu)',...
-->'f=x.^(nu/2-1).*exp(-x./2)/(2.^(nu/2).*gamma(nu./2))')
-->cc=[0:0.1:30];ff=fC(cc,10);
-->plot(cc,ff,'chi^2','fC(chi^2)','Chi-square - nu = 10')
The F distribution
The F distribution has two parameters N = numerator degrees of freedom, and D =
denominator degrees of freedom. The probability distribution function (pdf) is given by
D>0, N>0, x>0.
A plot of the F-distribution pdf for nN = 4, nD = 6, is obtained by using:
-->deff('[f]=fF(F,nuN,nuD)',...
-->'f=gamma((nuN+nuD)./2).*(nuN./nuD).^(nuN./2).*F.^(nuN./2-1)./...
-->(gamma(nuN./2).*gamma(nuD./2).*(1+nuN.*F./nuD).^((nuN+nuD)./2))')
-->xx=[0:0.1:10];ff=fF(xx,4,6);
-->plot(xx,ff,'F','fF(F)','F distribution - nuNum = 4 - nuDen = 6')
SCILAB provides the function cdff for operations with the cumulative distribution function of
the F distribution.
[P,Q]=cdff("PQ",F,Dfn,Dfd)
[F]=cdff("F",Dfn,Dfd,P,Q);
[Dfn]=cdff("Dfn",Dfd,P,Q,F);
[Dfd]=cdff("Dfd",P,Q,F,Dfn)
In these calls of the function cdff, P = P(FF<F), Q = 1 - P, Dfn and Dfd = degrees of freedom in
the numerator and denominator of F.
-->[P,Q] = cdff("PQ",1.2,6,12) //Probability calculation
Q =
.3697351
P =
.6302649
-->F = cdff("F",10,2,0.4,0.6) //Inverse CDF calculation
F =
.9944093
-->nuNum= cdff('Dfn',5,0.4,0.6,0.8) //calculating degrees of freedom
nuNum =
5.3847039
)
2
(
1
2 2
) 1 ( )
2
( )
2
(
) ( )
2
(
) (
D N
N N
D
F N D N
F
D
N D N
x f

+

+

A plot of the F-distribution CDF is produced through the following SCILAB commands:
-->xx = [0:0.1:10];
-->yy=[];for x=0:0.1:10, yy=[yy cdff('PQ',x,4,6)]; end;
-->plot(xx,yy,'t','fX(t)','F - nuNum = 4 - nuDen = 6')
Applications of the normal distribution in data
analysis
The normal distribution, also known as the bell curve, appears commonly when determining
the frequency distribution of different types of physical measurements. We first introduced
the normal distribution in Chapter 14 as an example of a continuous probability distribution. In
this section we present some applications of this probability distribution in data analysis.
The probability density function, pdf, for a general normal distribution, X, with a mean value,
, and a standard deviation, , is given by
. , 0 ,
2
) (
exp
2
1
) (
2
2
< < >
,
_

x
x
x f
X

The standard normal distribution has mean value = 0 and standard deviation = 1.
SCILAB provides function cdfnor for operations with the normal cumulative distribution
function. The different forms of the call to the function were presented in detail in Chapter$,
and are repeated here:
[p,q] = cdfnor(PQ,x,mu,sigma)
[x] = cdfnor(X,mu,sigma,p,q)
[mu] = cdfnor(Mean,sigma,p,q,x)
[sigma] = cdfnor(Std,p,q,x,mu)
where mu is the mean value (m), sigma is the standard deviation (s), p = P(X<x), and q = 1 - p =
P(X>x). The first argument in the different calls to cdfnor is a string that indicates the type of
result expected:
PQ - to request probabilities p and q
X - to request a value of the normal variable
Mean - to request the mean of the distribution
Std - to request the standard deviation of the distribution
Because the normal distribution is commonly found in the analysis of physical measurements, it
if often recommended that you check if your data set (your sample) follows the normal
distribution. In this section we present two graphical approaches for checking if your data
follows the normal distribution. The first consists of superimposing a normal distribution pdf,
based on the mean value and standard deviation of the sample, on top of the sample
histogram. The second approach consists in plotting the data against what is commonly known
their normal scores. The resulting graph is equivalent to plotting the data in a normal
probability paper, i.e., a paper with one scale representing the normal probability
corresponding to the data set. These two approaches are described next.
Plotting a histogram and its corresponding normal curve
The purpose of this plot is to visually check if the histogram of a sample, with a suitable
number of classes, matches a superimposed normal curve. For that purpose we propose the
following SCILAB user-defined function, histnorm:
function [chi2,cmark,fcount]=histnorm(x, xclass)
//This function calculates the frequency distribution
//for the data in (row) vector x according to the
//class boundaries contained in the (row) vector
//xclass. It also produces a histogram of the
//data and the normal curve that best fit the data.
//
//Typical call: [chi2,cm,f] = freqdist(x,xclass)
//where cm = class marks, f = frequency count,
// chi2 = chi-square parameter for the fitting
[m n] = size(x); //Sample size
[m nB] = size(xclass); //Number of class boundaries
k = nB - 1; //Number of classes
//Calculate class marks
cmark = zeros(1,k);
for ii = 1:k
cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));
end
//Initialize frequency counts to zero
fcount=zeros(1,k);
fbelow=0; fabove=0;
//Accumulate frequency counts
for ii = 1:n
if x(ii) < xclass(1)
fbelow = fbelow + 1;
elseif x(ii) > xclass(nB)
fabove = fabove + 1;
else
for jj = 1:k
if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1)
fcount(jj) = fcount(jj) +1;
end
end
end
end
//define normal CDF, calculate xbar, sx, chi-square parameter
nn = sum(fcount);
xbar = mean(x); sx = st_deviation(x);
xmin = min(xclass); xmax = max(xclass);
pk = [];
for j = 1:k+1
pk = [pk cdfnor("PQ",xclass(j),xbar,sx)];
end;
p_in_classes = pk(k+1)-pk(1);
pxclass = pk(2:k+1) - pk(1:k);
fc = pxclass*nn*p_in_classes;
//Chi square parameter
chi2=0;
for j = 1:length(fc)
chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);
end;
//Produce normal distribution for data
Dx = (xmax-xmin)/100;
xx = [xmin:Dx:xmax];
xxx = xx(1:100) + Dx/2;
pkk = [];
for j = 1:101
pkk = [pkk cdfnor("PQ",xx(j),xbar,sx)];
end;
pp = pkk(2:101) - pkk(1:100);
fcc = pp*p_in_classes*nn*100/k;
//Determine plot rectangle
ymin = 0;
ymaxf = max(fcount); ymaxy = max(fcc);
ymax = max(ymaxf,ymaxy);
ymax = int(1.1*ymax);
plotrectangle = [xmin ymin xmax ymax];
//plot the histogram and normal curve
xp = xclass(1:k);
xset('window',1);xbasc(1);
plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);
plot2d3('onn',xp',fcount',[1],'000');
plot2d(xxx',fcc',[2],'000');
xtitle('Histogram with normal curve','x','frequency');
//end function histnorm
Notice that this function uses SCILAB function cdfnor to calculate values of the cumulative
distribution function for the normal distribution where needed. The general call to the
function is:
[chi2,cm,f] = freqdist(x,xclass)
which returns, in general, the class marks, cm, the frequency count, f, and a chi-square
parameter defined as
k
i i
i i
fc
fc f
1
2
2
,
) (
where f
i
is the actual frequency count for the ith class, fc
i
is the estimated frequency count
obtained from the normal distribution for the ith class, and k is the number of classes in the
frequency distribution.
The
2
parameter follows the chi-square distribution with = k-1 degrees of freedom, and it is
used to check the hypothesis that the frequency distribution under consideration follows
indeed the normal distribution. The subject of hypothesis testing is developed in Chapter ,
therefore, we delay until then the use of the parameter returned from function histnorm.
Application of the function histnorm
In this example we apply function histnorm to a set of 200 data values between 0 and 100
generated using function rand as follows:
-->x = int(100*rand(1,200));
First, we check the minimum and maximum value of the data:
-->min(x), max(x)
ans =
0.
ans =
99.
A set of class boundaries of 0, 10, 20, , 100, will produce 10 classes for this sample:
-->xclass = [0:10:100];
Next, we load the function histnorm and apply the function to the data stored in x using the
class boundaries stored in xclass
-->getf(histnorm)
-->histnorm(x,xclass)
ans =
1.9583514
The value returned is the chi-square parameter for the normal curve fitting. The plot of the
histogram with the super-imposed normal curve is:
A second example for the same data sample is presented next in which we use 20 classes, with
class boundaries 0, 5, 10, , 95, 100, to classify the data:
-->xclass=[0:5:100];
The results from function histnorm are the chi-square parameter and the following plot:
ans =
2.0146916
The function can be invoked with a vector of three values in the left-hand side to produce not
only the chi-square parameter and the plot, but also the class marks and the frequency count
of the sample:
-->[X2,cm,f] = histnorm(x,[0:10:100])
f =
column 1 to 9
! 20. 18. 27. 18. 23. 22. 16. 18. 14. !
column 10
! 24. !
cm =
! 5. 15. 25. 35. 45. 55. 65. 75. 85. 95. !
X2 =
1.9583514
Notice that in the two graphs shown above, the normal curve does not fit the histograms very
well. The main reason is that the data was generated from an uniform distribution (i.e., using
the default settings of SCILABs function rand) and not from a normal distribution. Later in
this chapter we deal with the generation of data other than from an uniform distribution, and
will be using function histnorm to check how well those data fit the normal distribution.
Plotting data against their normal scores
Assume that the continuous random variable X follows the normal distribution with mean and
standard deviation . Given a probability p (0<p<1) such that P(X<x)=p with X ~ N(,), then
the value of x is referred to as the normal score for p. [Note: In some references in the
statistical literature the normal scores are related to a probability = 1 - p, so that if P(X>x
) =
, with X ~ N(,), x
is the normal score for .]

Suppose that we have an ordered data set, xp = {xp
1
<xp
2
< <xp
n
} that follows the normal
distribution with mean and standard deviations equal to the samples mean (x) and standard
deviation (s
x
). Also, assume that the probability of the interval [xp
i
, xp
i+1
] is the same for all
values of i = 1, 2, , n-1, say P(xp
i
<X<xp
i+1
) = q. Also, assume that P(X<xp
1
) = P(X>xp
n
) = q.
Thus, the entire area under the normal curve is split into n+1 sub-regions of the same area q as
illustrated in the figure below.
The value of q is, therefore, q = 1/(n+1), and we can write:
P(X<xp
1
) = q, P(X<xp
2
) = 2q, , P(X<xp
i
) = iq, , P(X<xp
n
) = nq.
In general,
P(X<xp
i
) = i/(n+1) = p
i
,
for of i = 1, 2, , n. The values p
i
are referred to as plotting positions for they are used to
obtain the normal scores corresponding normal score xp
i
.
Given an ordered data set, x = {x
1
< x
2
< < x
n
}, of size, n, we can generate a vector of
plotting positions, p
i
= i/(n+1), and obtain a set of normal scores xp
i
, by using the function call
cdfnor(X,x,s
x
,p
i
,1-p
i
), where x and s
x
are the mean and standard deviation of the data set.
If the given data set, x, does indeed follow the normal distribution with mean =x, and
standard deviation = s
x
, a plot of normal scores xp versus the original data x should produce a
straight line.
A function to produce a plot of data versus normal scores
The following function, normplot, takes as input a data set, or sample, x = {x
1
, x
2
, , x
n
},
orders it in increasing order, obtains the plotting positions p
i
, calculates the normal scores xp
i
,
and plots the normal scores versus the ordered data. It also plots a straight line representing
y=x, or the exact fitting for a normal distribution. The closer the plot of normal scores vs.
data is to the straight line representing the exact fitting for a normal distribution, the closer
the data set follows the normal distribution.
function normplot(x)
//This function produces a normal probability
//paper plot for the data in (row) vector x
xx = sortup(x); //order sample in increasing order
xm = mean(xx); //mean of sample
sx = st_deviation(xx); //standard deviation of sample
nn = length(x); //sample size
//Calculating plotting positions and normal scores
pp = []; xp = [];
for j = 1:nn
pp = [pp j/(nn+1)];
xp = [xp cdfnor(X,xm,sx,pp(j),1-pp(j))];
end;
//Determine the plotting rectangle xmin1 = min(xx); xmin2 = min(xp); xmin =
min(xmin1,xmin2); xmax1 = max(xx); xmax2 = max(xp); xmax = max(xmax1,xmax2);
ymin = min(xp); ymax = max(xp);
//Produce a graduated scale
[xminp, xmaxp, nxp] = graduate(xmin,xmax);
[yminp, ymaxp, nyp] = graduate(ymin,ymax);
//Plot scores vs. data and exact normal distribution fitting
plot2d(xp,xp,[ 1],011,y,[xminp yminp xmaxp ymaxp])
xset(mark,-9,2);
plot2d(xx,xp,[-9],011,y,[xminp yminp xmaxp ymaxp])
xtitle(Normal probability plot,x,z);
//end function normplot
An application of this function is shown next. First, we produce a sample of 200 data points
using a uniform distribution. Next, we load function normplot and produced the normal
probability plot.
-->x =int(100*rand(1,200));
-->getf(normplot)
-->normplot(x)
The resulting graph shows that the data does not follow the normal distribution particularly
near the lowest and highest values of the data set.
The lognormal distribution
If the random variable Y = ln(X) follows the normal distribution with mean
Y
=
ln(X)
and
standard deviation
Y
=
ln(X)
, then we say that the random variable X follows the lognormal
distribution. The probability density function of the lognormal distribution is given by
. 0 ,
2
) (ln
exp
2
1
) (
2
) ln(
2
) ln(
) ln(
>
,
_

x
x
x
x f
X
X
X
X

with
( ) ). 2 exp( 1 ) exp( ) exp( ) ( ,
2
1
exp
) ln(
2
) ln(
2
) ln(
2
) ln( ) ln( X X X X X X
X Var
,
_
+
For calculating probabilities we can use the normal distribution cdf by first calculating the
natural log of the variable, for example, if X~lognormal(
ln(X)
=1.2,
ln(X)
=0.5), to calculate the
probability P(X<2) use P(X<2) = P(ln(X)<ln(2)) = P(Y<0.6931) where Y ~ N(1.2, 0.5). We can
use function cdfnor to calculate this probability in SCILAB as follows:
-->cdfnor(PQ,log(2),1.2,0.5)
ans =
.1553616
Suppose that we want to find the inverse cumulative distribution function, i.e., a value of X
for which P(X<x) = 0.35, given
ln(X)
=1.2,
ln(X)
=0.5, we can use:
-->cdfnor(X,1.2,0.5,0.35,0.65)
ans =
1.0073398
The previous result actually gives a value of Y = ln(X) with Y ~ N(1.2, 0.5). The corresponding
value of X is calculated as X = exp(Y), i.e.,
-->exp(ans)
ans =
2.7383068
A graph of the lognormal probability density function for
ln(X)
=1.2,
ln(X)
=0.5 is produced by
using:
-->deff([ff]=fX(x,mu,sigma),...
-->ff=exp(-(log(x)-mu).^2./(2.*sigma.^2))./(sigma.*x.*sqrt(2.*%pi)))
-->mu=1.2;sigma=0.5;xx=[0.01:0.1:10];yy=fX(xx,mu,sigma);
-->plot(xx,yy,x,fX(x),Log-normal pdf)
Generating synthetic data
In this section we present pre-defined and user-defined functions that allows us to generate
data that follows a particular probability distribution. We refer to such data as synthetic
data.
Generating normally-distributed synthetic data
In the examples presented in the previous section on applications of the normal distribution we
generated data by using the function rand, which, by default, produces random data uniformly
distributed in the interval [0,1]. The function rand can also be used to produce normally
distributed data, z, that follows the standard normal distribution, i.e., Z ~ N(0,1), by, first,
using the function call
rand(normal)
and next using the function call
rand(n,m)
where n and m are integers. The last call to function rand will produce a matrix of n rows and
m columns whose elements are random numbers following the standard normal function.
Recalling that the standardized normal variate is defined as
Z = (X-)/,
values of x can be obtained from values of z by using
x = + z.
The following example illustrate how to use function rand to produce 200 data points that
follow the normal distribution with mean = 150, and standard deviation = 50:
-->x = 150 + 50.*rand(1,200);
To verify that the data do indeed follow the normal distribution, we use functions histnorm and
normplot applied to this data set. To use function histnorm, we first determine the minimum
and maximum values of the data set to determine which class boundaries use in the histogram:
-->xmin = min(x), xmax = max(x)
xmin = 34.558873
xmax = 317.59609
We select for class boundaries the values 25, 50, 75, , 300, 325:
-->xclass = [25:25:325];
The resulting histogram and superimposed normal curve are shown next:
-->histnorm(x,xclass);
The fitting of the histogram to the corresponding normal curve is relatively good, in spite of
the apparent discrepancy towards the center of the data. We can also use function normplot
to check the normality of the data as follows:
-->normplot(x)
The resulting normal probability plot is:
The plot suggests that the data follows the normal distribution for most of the range except for
values larger than about 220.
Additional applications of function rand
SCILABs function rand, as most numerical random number generators, uses a number, known
as the seed, to produce random numbers. To find out the current value of the seed in
function rand use:
-->rand(seed)
ans = 8.096E+08
To find out which type of random number generator is active in function rand (i.e., normal or
uniform) use:
-->rand(info)
ans = normal
To change the function rand back to uniform use:
-->rand(uniform)
To change the seed to the number 15, for example, use:
-->rand(seed,15)
The first 10 random numbers generated by rand after seeding it with a value of 15 are:
-->rand(1,10)
ans =
column 1 to 5
! .1018111 .5348560 .9628528 .1235873 .6667947 !
column 6 to 10
! .4106913 .6578733 .6756193 .1201851 .0268646 !
After generating those 10 random numbers the value of seed has changed to:
-->rand(seed)
ans =
57691269.
If, for some reason, you need to re-start the previous sequence of random numbers, you can
simply re-seed function rand with the value of 15:
-->rand(seed,15)
Check that you get the same sequence of random numbers by comparing the following 5
random numbers with the first 5 random numbers generated earlier after using seed = 15:
-->rand(1,5)
ans =
! .1018111 .5348560 .9628528 .1235873 .6667947 !
SCILAB function for generating synthetic data
SCILAB provides function grand (generating random numbers) to generate a vector or matrix
with data that follows, among others, the following distributions: binomial, Poisson, gamma,
beta, exponential, uniform integer, uniform real, normal, chi-squared, and Students t. Two
general calls to the function are:
[x] = grand(m,n,dist_type,dist_parameters)
[x] = grand(A,dist_type,dist_parameters)
where dist_type is a string identifying the type of distribution, and dist_parameters is a list of
the parameters defining the distribution. In the first form of the call the values m and n
represent the number of rows and columns of a matrix to be generated containing random
numbers that follow the desired distribution. In the second form of the function call an
existing matrix A is provided so that the function generates a new matrix with the same
dimensions as A containing the random numbers that follow the desired distribution.
The following strings identify the type of distribution requested. We also identify the
parameters required for each distribution:
String Distribution Parameters
bin Binomial N, P
poi Poisson
bet Beta ,
gam Gamma = shape, = scale
exp exponential 1/
nor normal ,
chi chi-square
f F N, D
uin uniform integer a, b
unf uniform real a, b
The specific function calls for each probability distribution are shown next:
Binomial: x=grand(m,n,bin,N,P), x=grand(A,bin,N,P)
Poisson: x=grand(m,n,poi,mu), x=grand(x,poi,)
Beta: x=grand(m,n,bet,,), x=grand(A,bet, ,)
Gamma: x=grand(m,n,gam, ,), x=grand(A,gam, ,)
Exponential: x=grand(m,n,exp,), x=grand(A,exp,)
Normal: x=grand(m,n,nor,, ), x=grand(A,nor, , )
Chi-square: x=grand(m,n,chi,), x=grand(A,chi, )
F-distribution: x=grand(m,n,f, N, D), x=grand(A,f, N, D)
Uniform integer: x=grand(m,n,uin, ,), x=grand(x,uin, a, b)
Uniform real: x=grand(m,n,unf, ,),x=grand(x,unf, a, b)
Examples of synthetic data generation using function grand
The following examples demonstrate how to use function grand to generate sets of 200 data
points that follow specific probability distributions. After the data are generated we
determine their maximum and minimum values, select class boundaries for histograms of the
data, and use functions histnorm and normplot to check how close the data are to normality.
We start the exercises by loading these two functions:
-->getf(histnorm);getf(normplot);
Binomial data
-->x=grand(1,200,bin,20,0.35);xmin=min(x),xmax=max(x)
xmin = 2.
xmax = 14.
-->xclass=[2:2:14];xset(window,1);histnorm(x,xclass);
-->xset(window,2);normplot(x);
Poisson data
-->x=grand(1,200,poi,12.5);xmin=min(x),xmax=max(x)
xmin =
4.
xmax =
23.
Beta data
-->x=grand(1,200,bet,2,3);xmin=min(x),xmax=max(x)
xmin = .0480813
xmax = .9132797
-->xclass=[0:0.1:1];xset(window,1);histnorm(x,xclass);
Gamma data
-->x=grand(1,200,gam,2,3);xmin=min(x),xmax=max(x)
xmin = .0042184
xmax = 2.6455776
-->xclass=[0:0.4:2.8];xset(window,1);histnorm(x,xclass);
Normal data
-->x=grand(1,200,nor,2500,1250);xmin=min(x),xmax=max(x)
xmin =
1294.6718
xmax =
6467.2541
-->xclass=[-1000:1000:7000];xset(window,1);histnorm(x,xclass);
Chi-square data
-->x=grand(1,200,chi,12);xmin=min(x),xmax=max(x)
xmin =
3.8312405
xmax =
28.583772
F distribution data
-->x=grand(1,200,f,10,5);xmin=min(x),xmax=max(x)
xmin = .110966
xmax = 53.694396
-->xclass=[0:2:12];histnorm(x,xclass);
-->xclass=[0:0.5:6];histnorm(x,xclass);
Uniform integer data
-->x=grand(1,200,uin,-5,5);xmin=min(x),xmax=max(x)
xmin =
-5.
xmax =
5.
Uniform real data
-->x=grand(1,200,unf,-5,5);xmin=min(x),xmax=max(x)
xmin =
-4.9677424
xmax =
4.9660118
Additional notes on function grand
The previous examples were used to illustrate applications of function grand to the generation
of data that follows the binomial, Poisson, gamma, beta, exponential, normal, chi-square, F-,
uniform integer, and uniform real distributions. Function grand allows the user to obtain data
that follow other distributions that are not presented in this book, such as the negative
binomial distribution, the multinomial distribution, the non-central F distribution, and the non-
central chi-square distribution. (To find information about these and other distributions
consult a statistics and probability textbook such as Spanos, A., 1999, Probability Theory and
Statistical Inference - Econometric Modeling with Observational Data, Cambridge University
Press, Cambridge, U.K.).
To obtain additional details on the use of function grand use:
-->help grand
Function grand has access to 32 different random number generators that constitute the basis
upon which random numbers that follow a particular probability distribution are generated. By
default, functions rand and grand use generator number 1. To check out which is the current
active random number generator use:
-->grand(getcgn)
ans =
1.
This result indicates that you are currently using SCILABs default random number generator.
The random number generators provided by SCILAB for use with function grand require two
seed numbers. To see the current seed numbers you can use the statement:
-->seeds = grand(getsd)
seeds =
1.0E+08 *
! 20.45933 9.2172801 !
You can re-initialize those seed to the original seeds by using:
-->grand(initgn,-1)
ans =
1.
We can check the initial seeds after re-initialization by using:
-->seeds = grand(getsd)
seeds =
1.0E+08 *
! 12.345679 1.2345679 !
You can also re-seed the generator (i.e., provide new seeds) by using the following call to
function grand:
-->grand(setall,10,20)
ans =
setall
To check that the new seeds are active use:
-->seeds=grand(getsd)
seeds =
! 10. 20. !
To change the random number generator from generator number 1 to generator number 5, for
example, use:
-->grand(setcgn,5)
ans =
5.
The following call to function grand can be used to verify that the change of generator has
been made:
-->grand(getcgn)
ans =
5.
To check the values of the seeds for the current generator use:
-->seeds=grand(getsd)
seeds =
! 3.795E+08 77757764. !
Pseudo-random generators
The random number generators used in SCILAB and other computer applications are known as
pseudo-random generators because, after generating a sufficiently long sequence of numbers,
the numbers start repeating. Therefore, they are not strictly random generators, but only
quasi-random or pseudo-random.
The random number generator provided with SCILAB is able to produce 2.310
18
numbers
before repetition of numbers occurs. This collection of numbers is partitioned into 32 pseudo-
random generators, each containing 2
20
=1,048,576 blocks of non-overlapping random numbers.
Each block is 2
30
= 1,073,741,824 in length.
Given the size of the sequences of random numbers that can be generated with each of
SCILABs 32 pseudo-random number generators, we are confident that the numbers thus
generated are random enough for most practical applications. Furthermore, use of the default
generator should be enough for most applications unless you
Another application of function grand is in the generation of permutations of a column vector.
For example, the following application produces 10 permutations of the vector M containing
the first five positive integers. The permutations are shown as columns of a matrix.
-->M = [1 2 3 4 5];
-->grand(10,prm,M)
ans =
! 1. 2. 4. 1. 4. 4. 5. 4. 1. 3. !
! 3. 1. 2. 4. 2. 2. 1. 3. 4. 2. !
! 2. 3. 5. 5. 5. 3. 2. 2. 2. 5. !
! 5. 4. 3. 3. 3. 5. 3. 1. 3. 4. !
! 4. 5. 1. 2. 1. 1. 4. 5. 5. 1. !
Generating log-normally-distributed data
To generate log-normally distributed data we first generate a set of normally distributed data
and then apply the exponential function to that data set. For example, if X follows the
lognormal distribution with
ln(X)
=1.2,
ln(X)
=0.5, we can use the following SCILAB commands to
generate a set of 200 data points. We apply functions histnorm and normplot to this data set
to check how close the data are to normality.
-->y=grand(1,200,nor,1.2,0.5); //Generate normal data N(1.2,0.5)
-->x=exp(y); //Generate log-normal data by using exp
-->xmin=min(x),xmax=max(x) //Determine min and max values
xmin =
1.1210567
xmax =
11.161347
-->xclass=[0:2:12];histnorm(x,xclass); //Histogram
-->normplot(x); //Normal probability plot
Generating data that follows the Weibull distribution
SCILAB does not provide for a function to generate data that follows the Weibull distribution,
however, using the uniformly-generated random numbers from function rand we can generate
numbers p between 0 and 1 that represent probabilities p = F
X
(x) = P(X<x). Next, we use the
cumulative distribution function for the Weibull distribution, namely,
and solve for x given values of p, i.e.,
.
) 1 ln(
/ 1
1
]
1

p
x
The following SCILAB commands are used to generate 200 data points that follow the Weibull
distribution with a =2, b = 3. We also use functions histnorm and normplot to check how close
these data are to normality.
-->getf(histnorm);getf(normplot) //Load functions
-->p=rand(1,200); //Generate probabilities
-->a=2; b=3; //parameters of Weibull distribution
-->x = (-log(1-p)/a)^(1/b); //generate Weibull data
-->xmin=min(x), xmax = max(x) //check data range
xmin =
.1230276
xmax =
1.3553315
-->xclass = [0:0.1:1.4]; //select classes for histogram
-->histnorm(x,xclass); //plot histogram and normal curve
0 , 0 , 0 ), exp( 1 ) ( > > >
x for x x F
-->normplot(x) //create normal probability plot
It is interesting to notice that this Weibull data is very close to normality.
Generating data that follows the Students t distribution
Function grand does not allow for the generation of data following the Students t distribution.
However, SCILAB provides for function cdft which lets you obtain the inverse of the cumulative
distribution. Using an approach similar to that shown above for the Weibull distribution, we
can generate random probability values through function rand, and then use function cdft to
generate the data required.
The following example illustrates the procedure:
-->getf(histnorm);getf(normplot); //Load functions histnorm & normplot
-->pp = rand(1,200); //Generate random probabilities
-->x = []; //This line and the for end
-->for j =1:200 //construct calculate values of x
--> x = [x cdft(T,6,pp(j),1-pp(j))];
-->end;
-->xmin=min(x), xmax=max(x) //Determine min & max values
xmin = 6.9441809
xmax = 3.4425429
-->xclass=[-7:1:4];xset(window,1);histnorm(x,xclass); //Histogram
-->xset(window,2);normplot(x); //Normal probability plot
Generating data that follows a discrete distribution
Using function grand we were able to generate discrete data that follows the binomial,
Poisson, and uniform integer distributions. In this section we present a general method for the
generation of data given a discrete distribution in the form of a table. For example, the
following table shows the probability mass function, f
x
(x) = P(X=x), and cumulative distribution
function, F
X
(x) = P(X<x), of a discrete random variable X:
Random numbers
X f
X
(x) F
X
(x) From to
0.5 0.10 0.10 0.00 0.10
1.5 0.25 0.35 0.10 0.35
2.5 0.20 0.55 0.35 0.55
3.5 0.15 0.70 0.55 0.70
4.5 0.15 0.85 0.70 0.85
5.5 0.15 1.00 0.85 1.00
The last two columns of the table represent the range of probabilities corresponding to the
cumulative distribution function for each value of X. The procedure for generating data
consists in obtaining a value of random probability p = P(X<x) from a uniform distribution, e.g.,
using function rand, and then assigning a value of X according to the range of values of the
random numbers. Thus, if function rand produces the random number 0.25, we assign to x the
corresponding value X = 1.5.
The following function, discrand, will generate a matrix of dimensions nm random numbers
given vectors of values of X and FX, representing the values of a discrete random variable and
its corresponding cumulative distribution function.
function [x] = discrand(n,m,xx,FX)
//A function to generate a matrix nxm
//following a discrete probability distribution
//represented by vectors xx and FX = P(X<xx)
nx = length(xx);
pp = rand(n,m);
x = zeros(n,m);
FXX = [0.00 FX];
for i = 1:n
for j = 1:m
for k = 1:nx
if pp(i,j)>FXX(k) & pp(i,j)<=FXX(k+1) then
x(i,j) = xx(k);
end;
end;
end;
end;
//end function discrand
An application of the function to generate 200 data points that follow the probability
distribution shown in the table above is presented next. We first load function discrand, then
enter the values of X and F
X
(x), and generate a row vector of 200 points. Next, we load
functions histnorm and normplot to check how well the data follows a normal distribution.
-->getf(discrand)
-->X = [0.5:1.0:5.5]; FX = [0.10,0.35,0.55,0.70,0.85,1.00];
-->x=discrand(1,200,X,FX);
-->getf(histnorm);getf(normplot);
-->xmin=min(x), xmax=max(x)
xmin =
.5
xmax =
5.5
-->xclass=[0.5:0.5:5.5];
ans =
24.643214
-->normplot(x)
Statistical simulation
Many physical or other type of systems are described by one or more mathematical
relationships (e.g., algebraic, difference, or differential equations) of diverse degrees of
complexity. We will refer to the set of mathematical relationships that describe a physical
system as a model. A model typically depends of certain constant values known as the
parameters of the model. In the simplest of cases, a model can be represented by a black box
into which a set of input data is provided, and from which a set of output results is obtained.
This is illustrated in the following figure:
If the model is such that for a given set of input data it always produces a predictable result, it
is referred to as a deterministic model. An example of a deterministic model is the equation
that describes the electric current, I, through a resistor, R, when a voltage, V, is applied across
the terminals of the resistor. The equation is
I = V/R.
If we apply a constant voltage V
o
to the resistor, we get back a constant electric current, I
0
=
V
o
/R. If we instead apply a variable voltage V(t) = V
o
sin(t), we obtain an electric current,
I(t) = (V
o
/R)sin(t). Thus, knowing the value of the resistance R and the input to the system,
i.e., the voltage, V
0
or V(t), we can always find the value of the electric current. We cannot
get more deterministic than this example.
If the input to the model is of a random nature, or if there is a random component to the
model itself, the model is said to be probabilistic or stochastic. For example, the black-box
model described above can be used to describe a hydrological basin. The input data is the
amount and duration of the precipitation falling on the basin on a certain period of time. (A
graphical representation of precipitation vs. time is referred to as a hyetograph). This input
is, by its own nature, random or stochastic. This means that we cannot know exactly the
amount of precipitation that will occur, say, in the next 24 hours.
Although a hydrological basin is extremely more complicated than an electric resistor, the
model used to predict the runoff (output) to the system can be a simple relationship involving
one or two parameters. (A graphical representation of the runoff coming out of the basin as a
function of time is known as a hydrograph). If the input hyetograph is known, then the output
hydrograph can be obtained in a deterministic way. However, because we do not know
exactly the input hyetograph for a particular period of time, except in a statistical manner, the
model is indeed a stochastic one.
Through the keeping of historical records of precipitation in the basin we can get a good idea
of the stochastic nature of precipitation to use as input for our stochastic model. We can then
generate synthetic data representing the precipitation and use it as input to the model. This
approach to modeling physical (or economical, or other type of) systems is known as a Monte
Carlo method. (The name derives from the capital of the European principalty of Monaco, the
city of Monte Carlo, famous for its casinos, where the laws of probability are seen in action
night and day.)
Monte Carlo methods find applicability in all types of models where there is a random
component to the input or parameters of the model. Statistical modeling can be used to
model, for example, economic responses from human populations, the distribution of soil
permeabilities in an aquifer, the distribution of animal or plant populations, traffic patterns in
highways or airports, weather phenomena, etc. A simple application of a Monte Carlo method
to simulate the patterns of traffic through a service station is shown below.
Simulating traffic through a service station
Suppose we want to simulate the traffic through a service station in which only one customer
can be serviced at a time. We also assume that once a customer arrives to the service station,
he or she will not leave until service is provided. This is a simplistic model, but it could be
used to simulate a vehicle service station in a city or highway, a medical emergency room, a
highway service station for state or privately own trucks, a store, etc.
The first customer arrives at a certain arrival time, AT
1
(Arrival Time). He or she is taken care
of right away so that the starting time of service for customer 1, ST
1
(Starting Time), coincides
with his or her arrival time, thus, ST
1
= AT
1
. The waiting time for customer 1 is, therefore,
zero, i.e., WT
1
= 0. The number of customers awaiting service at this point is also zero, i.e.,
NW
1
= 0. The time required to service this first customer is referred to as TS
1
(Time of
Service). The first customer leaves the service station at time ET
1
= ST
1
+ TS
1
(Ending Time).
The second customer arrives at the service station at a time AT
2
. If AT
2
< ET
1
(i.e., the second
customer arrives before service for the first one has finished), the second customer must wait
until the first customer leaves, so that ST
2
becomes ET
1
(ST
2
= ET
1
). In this case, we can
calculate a waiting time for the second customer equal to WT
2
= ET
1
- AT
2
. Also, the number of
customers waiting for service at this point is NW
2
= 1. If, instead, the second customer arrives
at a time AT
2
ET
1
, then ST
2
= AT
2
, and WT
2

= 0. In any event, the ending time for the second
customer is calculated as ET
2
= ST
2
+ TS
2
.
We define the inter-arrival time between customers 1 and 2 as IAT
1
= AT
2
- AT
1
. In general, the
inter-arrival time between customers i and i+1 is IAT
i
= AT
i+1
- AT
i
. The inter-arrival time (IAT
i
)
and the time of service (TS
i
) are considered random variables of discrete nature. Thus, IAT
i
and TS
i
constitute random input to the model.
Suppose that we want to simulate the operation of the service center for n customers, we first
generate n-1 values of inter-arrival time {IAT
1
, IAT
2
, , IAT
n-1
}, as well as n values of the
service time {TS
1
, TS
2
, , TS
n
}. Then, we proceed to calculate the arrival times as
AT
i+1
= AT
i
+ IAT
i
, i = 1, 2, , n-1.
As indicated earlier, the starting and ending times for the first customer are ST
1
= AT
1
, ET
1
=
ST
1
+ TS
1
. Also, the waiting time and number of customers waiting at this stage are both zero,
i.e., WT
1
= 0, and NW
1
= 0. The starting time for customer 2 is obtained as follows:
If AT
2
> ET
1
, then ST
2
= AT
2
, WT
2
= 0, NW
2
= 0
If AT
2
< ET
1
, then ST
2
= ET
1
, WT
2
= ET
1
- AT
2
, and NW
2
= 1.
For the third customer, we need to check the arrival time, AT
3
, against the ending times of
both the first and second customers so we can determine the starting time, the waiting time,
and the number of customers waiting at that point. The following piece of pseudo-code can
be used to determine such values:
for j = 2:n
NW
j
= 0
WT
j
= 0
for k = 1:j-1
if AT
j
< ET
k
then
NW
j
= NW
j
+ 1
WT
j
= ET
k
- AT
j
ST
j
= ET
k
else
ST
j
= AT
j
end
end
ET(
j
) = ST(
j
)+TS(
j
)
End
An user-defined function to simulate traffic through a service
station
The steps outlined above are put together in the following function, service:
function [MR] = service(IAT,TS)
//Simulation of traffic in a service station
//Given n-1 values of inter-arrival time IAT
//and n values of time of service TS.
//Results:
//Arrival time = AT, Starting time = ST
//Ending time = ET, Waiting time = WT
//Number of waiting customers = NW
//
n = length(TS);
AT = zeros(1,n);
ST = zeros(1,n);
ET = zeros(1,n);
NW = zeros(1,n);
WT = zeros(1,n);
IATT = [IAT 0];
ST(1) = AT(1);
ET(1) = ST(1) + TS(1);
for j = 2:n
AT(j) = AT(j-1) + IAT(j-1);
end;
for j = 2:n
NW(j) = 0;
WT(j) = 0;
for k = 1:j-1
if AT(j) < ET(k) then
NW(j) = NW(j) + 1;
WT(j) = ET(k) - AT(j);
ST(j) = ET(k);
else
ST(j) = AT(j);
end;
end;
ET(j) = ST(j)+TS(j);
end;
disp(' ');
printf('===============================================================\n');
printf(' j AT IAT ST TS ET WT NW \n');
printf('===============================================================\n');
for j = 1:n
printf('%3.0f %8.2f %8.2f %8.2f %8.2f %8.2f %8.2f %3.0f\n',...
j,AT(j),IATT(j),ST(j),TS(j),ET(j),WT(j),NW(j));
end;
printf('===============================================================\n');
MR = [AT' IATT' ST' TS' ET' WT' NW']; //Matrix of Results
printf('AT = arrival times IAT = inter-arrival times \n');
printf('ST = starting times TS = time of service \n');
printf('ET = ending times WT = waiting times \n');
printf('NW = number of customers waiting \n');
disp(' AT IAT ST TS ET WT NW');
//end function service
As an example, suppose that we have the following inter-arrival times (IAT) and times of
service (TS):
-->IAT = [ 0.5 0.75 0.5 0.25 0.5];
-->TS = [ 1 2 1 1 2 1];
We can load function service and run it with the values of IAT and TS defined earlier to obtain
the following results:
-->Matrix_of_results = service(IAT,TS)
===============================================================
j AT IAT ST TS ET WT NW
===============================================================
1 0.00 .50 0.00 1.00 1.00 0.00 0
2 .50 .75 1.00 2.00 3.00 .50 1
3 1.25 .50 3.00 1.00 4.00 1.75 1
4 1.75 .25 4.00 1.00 5.00 2.25 2
5 2.00 .50 5.00 2.00 7.00 3.00 3
6 2.50 0.00 7.00 1.00 8.00 4.50 4
===============================================================
AT = arrival times IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times WT = waiting times
NW = number of customers waiting
AT IAT ST TS ET WT NW
Matrix_of_results =
! 0. .5 0. 1. 1. 0. 0. !
! .5 .75 1. 2. 3. .5 1. !
! 1.25 .5 3. 1. 4. 1.75 1. !
! 1.75 .25 4. 1. 5. 2.25 2. !
! 2. .5 5. 2. 7. 3. 3. !
! 2.5 0. 7. 1. 8. 4.5 4. !
The function is designed to provide a table of results, as well as a matrix summarizing the
results in case that additional operations on those results are required within SCILAB. The
function, as applied in this case, is purely deterministic in the sense that for the given input we
get a unique result. To work out a stochastic modeling of traffic through a service station we
need to provide random input. The following example shows how to obtain that random input.
Modeling traffic through a service station with random input
Suppose that the inter-arrival times and time of service for the service station model follows
the probability distributions shown in the following table:
x = IAT F
X
(x) x = TS F
X
(x)
0.1 0.05 0.25 0.10
0.2 0.10 0.50 0.20
0.3 0.20 0.75 0.40
0.4 0.35 1.00 0.70
0.5 0.45 1.25 0.80
0.6 0.50 1.50 0.90
0.7 0.70 1.75 0.95
0.8 0.75 2.00 1.00
0.9 0.95
1.0 1.00
We want to analyze the traffic through the service station for 10 customers by generating 9
inter-arrival times and 10 service times from these generations. The inter-arrival times and
times of service can be generated using function discrand as follows:
-->getf('discrand')
-->xIAT = [0.1:0.1:1.0]; FIAT = [0.05,0.1,0.2,0.35,0.45,0.5,0.7,0.75,0.95,1.0];
-->xTS = [0.25:0.25:2]; FTS = [0.1,0.2,0.4,0.7,0.8,0.9,0.95,1];
-->IAT = discrand(1,9,xIAT,FIAT) //generate IAT data
IAT =
! .4 .7 .7 .5 .4 .7 .5 .9 .1 !
-->TS = discrand(1,10,xTS,FTS) //generate TS data
TS =
! 1. .75 1. .75 .5 1.25 .75 .5 1. .5 !
With these values of IAT and ST we now call function service:
-->M = service(IAT,TS)
===============================================================
j AT IAT ST TS ET WT NW
===============================================================
1 0.00 .40 0.00 1.00 1.00 0.00 0
2 .40 .70 1.00 .75 1.75 .60 1
3 1.10 .70 1.75 1.00 2.75 .65 1
4 1.80 .50 2.75 .75 3.50 .95 1
5 2.30 .40 3.50 .50 4.00 1.20 2
6 2.70 .70 4.00 1.25 5.25 1.30 3
7 3.40 .50 5.25 .75 6.00 1.85 3
8 3.90 .90 6.00 .50 6.50 2.10 3
9 4.80 .10 6.50 1.00 7.50 1.70 3
10 4.90 0.00 7.50 .50 8.00 2.60 4
===============================================================
AT = arrival times IAT = inter-arrival times
ST = starting times TS = time of service
ET = ending times WT = waiting times
NW = number of customers waiting
AT IAT ST TS ET WT NW
M =
! 0. .4 0. 1. 1. 0. 0. !
! .4 .7 1. .75 1.75 .6 1. !
! 1.1 .7 1.75 1. 2.75 .65 1. !
! 1.8 .5 2.75 .75 3.5 .95 1. !
! 2.3 .4 3.5 .5 4. 1.2 2. !
! 2.7 .7 4. 1.25 5.25 1.3 3. !
! 3.4 .5 5.25 .75 6. 1.85 3. !
! 3.9 .9 6. .5 6.5 2.1 3. !
! 4.8 .1 6.5 1. 7.5 1.7 3. !
! 4.9 0. 7.5 .5 8. 2.6 4. !
Out of the matrix of results, M, we can extract individual columns of data, for example, the
waiting time data corresponds to the sixth column of M:
-->WT = M(:,6)
WT =
! 0. !
! .6 !
! .65 !
! .95 !
! 1.2 !
! 1.3 !
! 1.85 !
! 2.1 !
! 1.7 !
! 2.6 !
The number of waiting customers is extracted from the seventh column of matrix M:
-->NW = M(:,7)
NW =
! 0. !
! 1. !
! 1. !
! 1. !
! 2. !
! 3. !
! 3. !
! 3. !
! 3. !
! 4. !
The columns of data extracted from the matrix of results, M, can be used to obtain statistics
such as the mean and standard deviation:
-->WT_mean = mean(WT), WT_sdev = st_deviation(WT)
WT_mean = 1.295
WT_sdev = .7836701
-->NW_mean = mean(NW), NW_sdev = st_deviation(NW)
NW_mean = 2.1
NW_sdev = 1.2866839
We can also function normplot to check how close the data is to normality:
-->getf('normplot')
-->normplot(NW')
-->normplot(WT')
STIXBOX: a rudimentary statistics toolbox
STIXBOX (an abbreviation of statistical toolbox) is a collection of functions that perform
selected statistical and probability calculations. STIXBOX is available for download from the
SCILAB main web page (http://www-rocq.inria.fr/SCILAB/). Instructions for its installation are
provided with the downloaded functions. The package includes a set of help manual pages
that briefly describe the operation of the functions. Once loaded, the manual pages are
available through the main SCILAB Help window.
Probability mass and probability density functions
Probability mass functions or pmf (for discrete random variables) and probability density
functions of pdf (for continuous random variables) start with the letter d, e.g., dbeta, dbinom,
etc. Mass distribution functions are referred to by p
X
(k) = P[X=k], and probability density
functions by f
X
(x). Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X=2] = p
X
(2) =
dbinom(2,10,0.5). And, if X ~ Normal(,
2
) with = 1.5, = 0.2, then f
X
(1.75) =
dnorm(1.75,1.5,0.2). The following probability mass and density functions are defined:
dbeta the beta density function
dbinom the binomial probability function
dchisq the chisquare density function
df The F density function [modified by the author, 2/1/2001]
dgamma the gamma density function
dhypgeo the hypergeometric probability function
dnorm the normal density function [modified by the author, 2/1/2001]
dt the student t density function
Cumulative distribution functions
Cumulative distribution functions (cdf) are referred to as distribution functions if dealing with
continuous variable, or as cumulative probability function if dealing with discrete variables.
All cdfs in this package start with a p: pbeta, pbinom, etc. Both, discrete and continuous cdfs
are referred to by F
X
(x) = P[Xx]. Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X2] =
F
X
(2) = pbinom(2,10,0.5). And, if X ~ Normal(,
2
) with = 1.5, = 0.2, then F
X
(1.75) =
pnorm(1.75,1.5,0.2). The following cumulative distribution functions are defined:
pbeta the beta distribution function
pbinom the binomial cumulative probability function
pchisq the chisquare distribution function
pf The F distribution function
pgamma the gamma distribution function
phypge the hypergeometric cumulative probability function
pnorm the normal distribution function
pt the student t cdf (modified by the author, 2/1/2001)
Inverse cumulative distribution functions
Inverse cumulative distribution functions start with q: qbeta, qbinom, etc. . If F
X
(q) = P[Xq] =
p, then q = F
X
-1
(p). The value q is also referred to as a quantile of the distribution. The
following inverse cumulative distribution functions are defined:
qbeta the beta inverse distribution function
qbinom the binomial inverse cdf
qchisq the chisquare inverse distribution function
qf The F inverse distribution function
qgamma the gamma inverse distribution function
qhypg the hypergeometric inverse cdf
qnorm the normal inverse distribution function
qt the student t inverse distribution function
quantile empirical quantile (percentile).
Generating synthetic data
The generation of synthetic data that follows a particular distribution can be accomplished
with the following random number generators. The name of the random generator functions
begins with r: rbeta, rbinom, etc. Maple already provides function rand that produces
uniformly distributed random numbers (use help rand for more information). The functions
provided by STIXBOX generates random numbers that follow the distributions suggested by the
names of the functions. Thus, if you want to generate n = 10 data values x that follow the
normal distribution, with = 0.5, and
lnX
= 0.1, use rnorm(10,0.5,0.1).
rbeta random numbers from the beta distribution
rbinom random numbers from the binomial distribution
rchisq random numbers from the chisquare distribution
rexpweib random numbers from the exponential or weibull distributions
rf random numbers from the F distribution
rgamma random numbers from the gamma distribution
rgeom random numbers from the geometric distribution
rhypg random numbers from the hypergeometric distribution
rjbinom random numbers from the binomial distribution (reject method)
rjgamma generates gamma random deviates (reject method)
rjpoiss random numbers from the poisson distribution (reject method)
rnorm normal random numbers
rjpoiss random numbers from the poisson distribution (renewal method)
rt random numbers from the student t distribution
Logistic regression
These functions involve the logistic population growth model (see, for example, Example 8.3,
page 504, in Kottegoda, N.T. and R. Rosso, 1997, Probability, Statistics, and Reliability for
Civil and Environmental Engineers, The McGraw-Hill Companies, Inc., New York).
lodds log odds function.
loddsinv compute the inverse of log odds.
logitfit fit a logistic regression model.
Statistical graphics
Functions to produce a variety of statistical graphics. A normal probability paper plot is
obtained by using qqnorm. Probability paper plots are also referred to as Q-Q plots. For that
reason the corresponding function names start with qq, e.g., qqgamma, qqnorm, etc. Also of
interest are functions histo, plotsym.
histo plot a histogram
identify identify points on a plot by clicking with the mouse.
pairs pairwise scatter plots (does not work)
plotdens draw a nonparametric density estimate.
plotsym plot with symbols
qqnorm normal probability paper
qqplot plot empirical quantile vs empirical quantile
Binomial coefficients
bincoef calculates binomial coefficients: (
n

r
) = n!/(r!(n-r)!),
Resampling methods
These methods apply to the process of resampling by which an attempt is made to remove any
existing bias in the sample. For a quick introduction to jackknife (named so because the
jackknife, like this method, is an useful tool) and the bootstrap (named so from the expression
"lifting oneself by one's bootstraps"), see pp. 116-117 in Kottegoda, N.T. and R. Rosso, 1997,
Probability, Statistics, and Reliability for Civil and Environmental Engineers, The McGraw-Hill
Companies, Inc., New York.
covboot bootstrap estimate of the variance of a parameter estimate.
covjack Jackknife estimate of the variance of a parameter estimate.
stdboot bootstrap estimate of the parameter standard deviation.
stdjack Jackknife estimate of the standard deviation of a parameter.
rboot simulate a bootstrap resample from a sample.
ciboot various bootstrap confidence interval.
test1b bootstrap t test and confidence interval for the mean.
Tests, confidence intervals, and model estimation
These are functions related to statistical inference. Of interest for this class are the functions
lsfit, testln, and test2r. Use the help function to obtain additional information on the
functions.
cmpmod compare small linear model versus large one
ciquant nonparametric confidence interval for quantile
kstwo Kolmogorov-Smirnov statistic from two samples (needs function pks)
linreg linear or polynomial regression
lsfit fit a multiple regression model.
lsselect select a predictor subset for regression
test1n tests and confidence intervals based on a normal sample
test1r test for median equals 0 using rank test
test2n tests and confidence intervals based on two normal samples
test2r test for equal location of two samples using rank test
Stixbox demonstrations
These are SCILAB functions that demonstrate some of the functions contained in STIXBOX
stixdemo demonstrate various stixbox routines.
stixtest a second demo for stixbox
Famous datasets
Function getdata is used to load well-known datasets into the SCILAB environment. The data
sets included are:
1 Phosphorus Data
2 Scottish Hill Race Data
3 Salary Survey Data
4 Health Club Data
5 Brain and Body Weight Data
6 Cement Data
7 Colon Cancer Data
8 Growth Data
9 Consumption Function
10 Cost-of-Living Data
11 Demographic Data
To activate function getdata and load data into variable x use:
--> x = getdata()
This function produces a dialog box displaying the list of data sets. The user can type in the
number of the data set and get back some information about the data set before the set is
loaded. The dialog box produced by getdata() is shown below.
The dialog box shows that we have selected data set number 5. Pressing [OK] will load the
data as well as provide information as shown below.
Examples on probability distributions using STIXBOX
!Plot of the standard normal distribution:
-->z=-4:0.1:4;phi=dnorm(z,0,1);plot(z,phi,'z','phi(z)','standard normal')
!Plot of the Student-t distribution for = 2, 5, 10, 15, 20
-->t=-4.0:0.1:4;nu=[2,5,10,15,20];
-->for k=1:5,f=dt(t,nu(k));plot2d(t,f,k,'011',' ',[-4 0 4 0.4]), end
-->xtitle('Student t distribution','t','f(t)')
!Plot of the chi-square distribution for nu=5
-->x=0:0.1:20;nu=5;f=dchisq(x,nu);
-->plot(x,f,'x','f(x)','Chi-square distribution, nu=5')
!Plot the F distribution for nu1=5 and nu2=10:
-->x=0:0.1:5;nu1=5;nu2=10;f=df(x,nu1,nu2);
-->plot(x,f,'F','f(F)','F distribution, nu1=5, nu=10')
!Determining z
, such that P(Z>z
) > , or P(Z<z
) > 1- . Also, z
/2 is such that P(Z>z
/2
) >
/2, or P(Z<z
/2
) > 1- /2:
-->alpha = 0.05; z_alpha=qnorm(1-alpha), z_alpha2=qnorm(1-alpha/2)
z_alpha = 1.6448536
z_alpha2 = 1.959964
!Determining t
,,
such that P(T>t
) > , or P(T<t
) > 1- . Also t
,/2 is such that P(T>t
/2
) >
/2, or P(T<t
/2
) > 1- /2:
-->nu=10;alpha=0.01;t_alpha=qt(1-alpha,nu),t_alpha2=qt(1-alpha/2,nu)
t_alpha = 2.7637695
t_alpha2 = 3.1692727
!Determining
2
,
, such that P(X
2
>
2
) > , or P(X
2
>
2
) > 1- . Similar definitions are used

to calculate the values
2
,1
,
2
,/2
,
2
,1/2:
-->nu=6;alpha=0.10;X_alpha=qchisq(1-alpha,nu)
X_alpha = 10.644641
-->X_alpha2=qchisq(1-alpha/2,nu)
X_alpha2 = 12.591587
-->nu=6;alpha=0.10;X_alpha=qchisq(alpha,nu)
X_alpha = 2.2041307
-->X_alpha2=qchisq(alpha/2,nu)
X_alpha2 = 1.6353829
!Generating 20 data points that follow the Weibull distribution, and producing a normal
probability plot for such data:
-->x = rexpweib(20,3,5); qqnorm(x,'o')
!Generating 200 data points that follow the binomial distribution. A histogram of the data is
then produced.
-->x = rbinom(200,10,0.35); histo(x);
Other options for function histo( ),using 8 suggested classes (or bins). Parameter odd = 0. The
function histo( )chooses 6 classes:
-->histo(x,8,0)
In the next call, we suggest 15 classes, and the odd parameter takes a value odd = 1:
-->histo(x,15,1)
The next call scales area in the histogram bars so that the total area is equal to 1:
-->histo(x,8,0,1)
Exercises
[1]. The probability of a flood occurring in a particular section of a river in a given month is
estimated, form existing records, to be 0.15. (a) What is the probability that there will be
three months of flood in the next year. (b) What is the probability that there will be less than
6 months of flood in the next year.
[2]. Data kept at an airport shows an average of five cars per minute stopping to leave or pick
up passengers in the terminal curb. (a) What is the probability that in the next minute there
will be 10 or more cars stopping at the curb? (b) What is the probability that there will be no
cars at the curb in a given minute.
[3] It is known that 25 out of a batch of 200 concrete cylinders were prepared using a defective
type of cement. If a laboratory receives a sample of 15 of those cylinders, what is the
probability that the sample will contain 5 of the defective cylinders?
[4]. If a factory is known to produce 5% defective truck tires, what is the probability that in a
given assembly line the first defective tire is detected after 20 tires have come out of the
assembly line? What is the probability that the first defective tire is detected after 10 tires
have come out of the assembly line?
[5]. The time required to finish the construction of a mile of a particular highway is known to
have a normal distribution with a mean value of 3.5 days and a standard deviation of 0.5 days.
What is the probability that the next mile in the road will be completed between 3 and 5 days?
What is the probability that the construction of the next mile of the road will take more than 7
days?
[6]. Let X represent the intensity of an earthquake in a particular scale. If X is modeled using
the exponential distribution with parameter = 6.5, determine the probability that the
intensity of the next earthquake will be 3.5 or less. Also, determine the probability that the
intensity of the earthquake will be between 2.5 and 4.5.
[7]. The gamma distribution, with parameters =1.2, and = 0.5, is used to model the time
of failure (in hours) of an electronic component. Determine the probability that a particular
component will last 100 hours or more. Determine the probability that the component will last
less than 2 hours.
[8]. If the wind velocity in miles per hour near a harbor is assumed to follow a Weibull
distribution with parameters = 2 and = 3, determine the probability of the wind velocity
being between 15 and 75 mph. Also, determine the probability of the wind velocity being
larger than 10 mph.
[9]. For a large value of n, the Binomial distribution can be approximated by the normal
distribution with parameters = np, = np(1-p). Suppose that you receive a shipment of
1000 resistors produced by a machine that is know to produce 0.5% defective resistors. What is
the probability that there will be more than 200 defective resistors in the shipment by using:
(a) the normal distribution approximation to the Binomial distribution, and (b) the Poisson
distribution to the Binomial distribution.
[10]. Plot the probability mass function, f
X
(x), and the cumulative distribution function, F
X
(x),
for the following discrete distributions:
(a) Binomial with n = 20, p = 0.25 (b) Binomial with n = 20, p = 0.50
(c) Binomial with n = 20, p = 0.75 (d) Poisson with = 5.0, plot for x = 0,1,2,10
(e) Geometric with p = 0.25, for x = 1,2,,10 (f) Geometric with p = 0.50, for x = 1,2,,10
(g) Geometric with p = 0.75, for x = 1,2,,10 (h) Hypergeometric with N=100,n=20,a=40
(i) Hypergeometric with N=40, n = 10, a = 20 (j) Hypergeometric with N = 120,n = 80,a = 10
[11]. Let X be a discrete random variable that follows the binomial distribution with
parameters n and p. Let P
0
= P(X x). Calculate:
(a) P
0
given n = 20, p = 0.35, x = 5 (b) n given p = 0.25, x = 8, P
0
= 0.80
(c) p given n = 25, x = 20, P
0
= 0.75 (d) x given n = 10, p = 0.80, P
0
= 0.30
[12]. Plot the probability density function, f
X
(x), and the cumulative distribution function,
F
X
(x), for the following continuous distributions:
(a) Gamma with = 0.5, = 1.5 (b) Gamma with = 2, = 3
(c) Beta with = 0.5, = 1.5 (d) Beta with = 3, = 2
(e) Weibull with = 0.5, = 1.5 (f) Weibull with = 2, = 2
(g) Uniform with a = 2, b = 6 (h) Uniform with a = -3, b = 3
(i) Exponential with = 12.5 (j) Exponential with = 4.8
(k) Normal with = 5, = 5 (l) Normal with = 150, = 25
(m) Student t with = 4 (n) Student t with = 12
(o) Chi-square with = 4 (p) Chi-square with = 12
(q) F distribution with
N
= 4,
D
= 10 (r) F distribution with
N
= 4,
D
= 10
[13]. Let X be a continuous random variable that follows the Gamma probability distribution
with parameters and . Let P
0
(a) P
0
given = 2, = 3, x = 3.5 (b) given P
0
= 0.40, = 1.5, x = 1.2
(c) given P
0
= 0.60, = 5, x = 10.5 (d) x given P
0
= 0.20, = 10.5, = 0.3
[14]. Let X be a continuous random variable that follows the Beta probability distribution with
parameters and . Let P
0
(a) P
0
given = 2, = 3.5, x = 0.35 (b) given P
0
= 0.40, = 2.3, x = 0.76
(c) given P
0
= 0.60, = 2.5, x = 0.45 (d) x given P
0
= 0.20, = 10.5, = 0.3
[15]. Let T be a continuous random variable that follows Student t distribution with degrees
of freedom. Let P
0
= P(T t). Calculate:
(a) P
0
given = 10, t = 1.5
(b) given P
0
= 0.40, t = -0.8
(c) t given P
0
= 0.20, 8
[16]. Let
2
be a continuous random variable that follows the chi-square distribution with
degrees of freedom. Let P
0
= P(
2

2
). Calculate:
(d) P
0
given = 6,
2
= 2.25
(e) given P
0
= 0.40,
2
= -0.8
(f)
2
given P
0
= 0.20, 12
[17]. Let F be a continuous random variable that follows the F distribution with
N
degrees of
freedom in the numerator and
D
degrees of freedom in the denominator. Let P
0
= P(F F).
Calculate:
(a) P
0
given
N
= 4,
D
= 10, F = 2.5 (b)
N
given P
0
= 0.40,
D
= 15, F = 3.2
(c)
D
given P
0
= 0.60,
N
= 3, F = 0.45 (d) F given P
0
= 0.20,
N
= 8,
D
= 12
[18]. The following data represent measurements of the diameter of a cylinder produced for a
precision mechanism:
232. 248. 242. 250. 239. 244. 265. 262. 259. 236.
246. 308. 221. 275. 261. 217. 260. 273. 228. 269.
260. 247. 228. 274. 205. 254. 230. 252. 263. 255.
244. 264. 243. 255. 261. 236. 226. 264. 260. 265.
267. 243. 270. 275. 260. 281. 240. 257. 268. 231.
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data as
well as the corresponding normal curve. (b) Use function normplot to produce a normal
probability plot of the data. (c) Based on these two plots, how well do the data follow the
normal distribution?
[19]. The following data set represents the time to failure, in years, of light bulbs.
1.39 1.07 3.22 3.67 .55 .81 1.22 1.26 .05 1.54
.97 1.01 .44 1.97 1.9 .89 3.25 .85 1.04 .43
1.33 .82 2.04 1.02 .53 .13 2.06 2.96 1.96 1.5
3.05 .42 1.17 1.72 2.68 .56 2.13 1.56 2.09 1.26
3.21 .74 3.04 2.74 .83 .79 1.56 1.55 .96 1.23
[20]. The following data set represents the yearly rainfall depth, in mm, recorded at a certain
location:
126. 82.9 41.5 4.35 346. 102. 830. 12.8 366. 471.
408. 189. 646. 7.82 313. 17.4 165. 24.5 32.6 39.3
277. 13.7 52.3 171. 314. 60.6 29.1 468. 887. 44.5
135. 215. 106. 201. 51. 43. 335. 59.4 174. 870.
[21]. The following data set represents the number of vehicles stopping at a service station in a
given hour:
3. 5. 6. 4. 5. 9. 4. 4. 11. 4.
4. 8. 5. 4. 4. 6. 7. 4. 7. 8.
6. 9. 10. 7. 4. 3. 5. 9. 9. 11.
6. 5. 9. 12. 11. 5. 13. 8. 10. 6.
4. 5. 9. 8. 7. 5. 3. 6. 5. 5.
8. 3. 11. 4. 5. 9. 5. 1. 8. 6.
[22]. Generate data sets consisting of k values that follow the indicated distribution with the
parameters listed below. Use functions histnorm and normplot to produce a histogram and a
normal probability plot of the data. How well do the data thus generated follow the normal
distribution based on the histogram and probability plot?
(a) Binomial, k = 200, n = 30, p = 0.7
(b) Poisson, k = 300, = 14.5
(c) Beta, k = 150, =3.5, = 5.2
(d) Gamma, k = 100, =3.5, = 5.2
(e) Exponential, k = 500, = 5.75
(f) Normal, k=180, = 5.75, = 1.2
(g) Chi-square, k = 230, = 5
(h) F-distribution, k = 350,
N
= 5,
D
= 5
(i) Uniform integer, k = 125, a = -50, b = 50
(j) Uniform real, k = 200, a = 5.5, b = 17.5
(k) Weibull, k = 200, =7.2, = 2.1
(l) Students t, k = 150, = 12
(m) Log-normal, k = 200,
ln(X)
= 1.2,
ln(x)
= 0.5
[23]. Generate data sets consisting of 250 values that follow the discrete distribution
described by the following probability mass function:
x 1.2 2.3 4.1 5.2 6.1 7.2 8.4 9.3 11.1
f
X
(x) 0.04 0.08 0.12 0.16 0.08 0.04 0.20 0.24 0.04
Use functions histnorm and normplot to produce a histogram and a normal probability plot of
the data. How well do the data thus generated follow the normal distribution based on the
histogram and probability plot?
[24]. Function service was developed to simulate the traffic through a service station. Use
function service to produce a simulation of traffic through a service station that takes as input
50 values of the inter-arrival time (IAT) and 50 values of the time of service (TS) generated out
of the following cumulative distribution functions:
x=IAT F
X
(x) x=TS F
X
(x)
0.2 0.03 0.4 0.05
0.4 0.14 0.8 0.15
0.6 0.08 1.2 0.35
0.8 0.12 1.6 0.25
1.0 0.23 2.0 0.15
1.2 0.10 2.4 0.05
1.4 0.05
1.6 0.05
1.8 0.10
2.0 0.10
Use functions histnorm and normplot to produce a histogram and a normal probability plot of
the waiting time (WT) and number of customers waiting (NW). How well do the WT and NW
data follow the normal distribution?
[25]. One-dimensional random walk. Consider a particle that moves along a straight line
subject to a random motion. The particle starts at x
1
= 0 and moves to position x
2
= x
1
+ x
1
,
where x
1
is a random number. The next position of the particle is x
3
= x
2
+ x
2
, where x
2
is a
second random number. Subsequent positions of the particle are given by x
k+1
= x
k
+ x
k
. The
random numbers used must include both positive and negative values so that the particle can
move forward and backward.
(a) Plot the position x
k
vs. k for a one-dimensional random walk that involves 300
displacements x
k
generated from a normal distribution with = 0 and = 1.
(b) Plot the position x
k
vs. k for a one-dimensional random walk that involves 300
displacements x
k
generated from a uniform distribution between -1 and 1.
[26]. Two-dimensional random walk. A two-dimensional random walk involves the displacement
of a particle from a point (x
k
,y
k
) to a point (x
k+1
,y
k+1
) so that
x
k+1
= x
k
+ r
k
cos(
k
), and x
k+1
= x
k
+ r
k
sin(
k
),
where the values r
k
and
k
are random numbers.
(a) Plot the two-dimensional random walk that results form 200 values of r
k
with a normal
distribution with mean = 1 and standard deviation = 0.2, and 200 values of
k
uniformly distributed between 0 and 2.
(b) Plot the two-dimensional random walk that results form 100 values of r
k
with a Weibull
distribution with parameters = 2 and = 3, and 200 values of
k
uniformly distributed
between 0 and 2.
(c) Plot the two-dimensional random walk that results form 150 values of r
k
with a Gamma
distribution with parameters = 0.2 and = 1.3, and 200 values of
k
normally
distributed with mean = and standard deviation /2.
(d) Plot the two-dimensional random walk that results form 250 values of r
k
with a Beta
distribution with parameters = 2 and = 3, and 200 values of
k
uniformly distributed
between 0 and 2.
[27]. The following table shows the annual maximum flow for the Ganga River in India
measured at specific station.
Year Q(m
3
/s) Year Q(m
3
/s) Year Q(m
3
/s) Year Q(m
3
/s)
1885 7241 1907 7546 1929 4545 1951 4458
1886 9164 1908 11504 1930 5998 1952 3919
1887 7407 1909 8335 1931 3470 1953 5470
1888 6870 1910 15077 1932 6155 1954 5978
1889 9855 1911 6493 1933 5267 1955 4644
1890 11887 1912 8335 1934 6193 1956 6381
1891 8827 1913 3579 1935 5289 1957 4548
1892 7546 1914 9299 1936 3320 1958 4056
1893 8498 1915 7407 1937 3232 1959 4493
1894 16757 1916 4726 1938 3525 1960 3884
1895 9680 1917 8416 1939 2341 1961 4855
1896 14336 1918 4668 1940 2429 1962 5760
1897 8174 1919 6296 1941 3154 1963 9192
1898 8953 1920 8174 1942 6650 1964 3024
1899 7546 1921 9079 1943 4442 1965 2509
1900 6652 1922 7407 1944 4229 1966 4741
1901 11409 1923 5482 1945 5101 1967 5919
1902 9164 1924 19136 1946 4629 1968 3789
1903 7404 1925 9680 1947 4345 1969 4546
1904 8579 1926 3698 1948 4890 1970 3842
1905 9362 1927 7241 1949 3619 1971 4542
1906 7092 1928 3698 1950 5899
(a) Use function histnorm with a suitable number of classes to plot a histogram of the data
as well as the corresponding normal curve.
(b) Use function normplot to produce a normal probability plot of the data.
(c) Based on these two plots, how well do the data follow the normal distribution?
The following problems require that you load the functions from the Stixbox SCILAB
toolbox.
[28]. Using function getdata() load data set number 1, described as:
__________________________________________________________________________________
************************ Phosphorus Data **********************************
Source: Snedecor, G. W. and Cochran, W. G. (1967),Statistical Methods,
(6 Edition), Iowa State University, Ames, Iowa, p. 384.
Taken From: Chatterjee and Hadi (1988), p. 82.
Dimension: 18 observations on 3 variables
Description: An investigation of the source from which corn plants obtain
their phosphorus was carried out. Concentrations of phosphorus
in parts per millions in each of 18 soils were measured.
Column Description
1 Concentrations of inorganic phosphorus in the soil
2 Concentrations of organic phosphorus in the soil
3 Phosphorus content of corn grown in the soil at 20 degrees C
__________________________________________________________________________________
(a) Separate the three columns of data into vectors x, y, and z, and use the user-defined
function describe to obtain statistics of each of the columns of data.
(b) Use Stixbox function histo to obtain histograms of each of the columns of data.
(c) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data
columns.
[29].Using function getdata() load data set number 1, described as:
*********************** Scottish Hill Race Data *************************
(...lines removed...)
Column Definition
1 Distance (miles)
2 Climb (ft)
3 Time (seconds)
__________________________________________________________________________________
(d) Separate the three columns of data into vectors x, y, and z, and use the user-defined
function describe to obtain statistics of each of the columns of data.
(e) Use Stixbox function histo to obtain histograms of each of the columns of data.
(f) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data
columns.
REFERENCES (for all SCILAB documents at InfoClearinghouse.com)
Abramowitz, M. and I.A. Stegun (editors), 1965,"Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables," Dover Publications, Inc., New York.
Arora, J.S., 1985, "Introduction to Optimum Design," Class notes, The University of Iowa, Iowa City, Iowa.
Asian Institute of Technology, 1969, "Hydraulic Laboratory Manual," AIT - Bangkok, Thailand.
Berge, P., Y. Pomeau, and C. Vidal, 1984,"Order within chaos - Towards a deterministic approach to turbulence," John
Wiley & Sons, New York.
Bras, R.L. and I. Rodriguez-Iturbe, 1985,"Random Functions and Hydrology," Addison-Wesley Publishing Company,
Reading, Massachussetts.
Brogan, W.L., 1974,"Modern Control Theory," QPI series, Quantum Publisher Incorporated, New York.
Browne, M., 1999, "Schaum's Outline of Theory and Problems of Physics for Engineering and Science," Schaum's
outlines, McGraw-Hill, New York.
Farlow, Stanley J., 1982, "Partial Differential Equations for Scientists and Engineers," Dover Publications Inc., New
York.
Friedman, B., 1956 (reissued 1990), "Principles and Techniques of Applied Mathematics," Dover Publications Inc., New
York.
Gomez, C. (editor), 1999, Engineering and Scientific Computing with Scilab, Birkhuser, Boston.
Gullberg, J., 1997, "Mathematics - From the Birth of Numbers," W. W. Norton & Company, New York.
Harman, T.L., J. Dabney, and N. Richert, 2000, "Advanced Engineering Mathematics with MATLAB - Second edition,"
Brooks/Cole - Thompson Learning, Australia.
Harris, J.W., and H. Stocker, 1998, "Handbook of Mathematics and Computational Science," Springer, New York.
Hsu, H.P., 1984, "Applied Fourier Analysis," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego.
Journel, A.G., 1989, "Fundamentals of Geostatistics in Five Lessons," Short Course Presented at the 28th International
Geological Congress, Washington, D.C., American Geophysical Union, Washington, D.C.
Julien, P.Y., 1998,Erosion and Sedimentation, Cambridge University Press, Cambridge CB2 2RU, U.K.
Keener, J.P., 1988, "Principles of Applied Mathematics - Transformation and Approximation," Addison-Wesley
Publishing Company, Redwood City, California.
Kitanidis, P.K., 1997,Introduction to Geostatistics - Applications in Hydogeology, Cambridge University Press,
Cambridge CB2 2RU, U.K.
Koch, G.S., Jr., and R. F. Link, 1971, "Statistical Analysis of Geological Data - Volumes I and II," Dover Publications,
Inc., New York.
Korn, G.A. and T.M. Korn, 1968, "Mathematical Handbook for Scientists and Engineers," Dover Publications, Inc., New
York.
Kottegoda, N. T., and R. Rosso, 1997, "Probability, Statistics, and Reliability for Civil and Environmental Engineers,"
The Mc-Graw Hill Companies, Inc., New York.
Kreysig, E., 1983, "Advanced Engineering Mathematics - Fifth Edition," John Wiley & Sons, New York.
Lindfield, G. and J. Penny, 2000, "Numerical Methods Using Matlab," Prentice Hall, Upper Saddle River, New Jersey.
Magrab, E.B., S. Azarm, B. Balachandran, J. Duncan, K. Herold, and G. Walsh, 2000, "An Engineer's Guide to
MATLAB", Prentice Hall, Upper Saddle River, N.J., U.S.A.
McCuen, R.H., 1989,Hydrologic Analysis and Design - second edition, Prentice Hall, Upper Saddle River, New Jersey.
Middleton, G.V., 2000, "Data Analysis in the Earth Sciences Using Matlab," Prentice Hall, Upper Saddle River, New
Jersey.
Montgomery, D.C., G.C. Runger, and N.F. Hubele, 1998, "Engineering Statistics," John Wiley & Sons, Inc.
Newland, D.E., 1993, "An Introduction to Random Vibrations, Spectral & Wavelet Analysis - Third Edition," Longman
Scientific and Technical, New York.
Nicols, G., 1995, Introduction to Nonlinear Science, Cambridge University Press, Cambridge CB2 2RU, U.K.
Parker, T.S. and L.O. Chua, , "Practical Numerical Algorithms for Chaotic Systems, 1989, Springer-Verlag, New York.
Peitgen, H-O. and D. Saupe (editors), 1988, "The Science of Fractal Images," Springer-Verlag, New York.
Peitgen, H-O., H. Jrgens, and D. Saupe, 1992, "Chaos and Fractals - New Frontiers of Science," Springer-Verlag, New
York.
Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, 1989, Numerical Recipes - The Art of Scientific
Computing (FORTRAN version), Cambridge University Press, Cambridge CB2 2RU, U.K.
Raghunath, H.M., 1985, "Hydrology - Principles, Analysis and Design," Wiley Eastern Limited, New Delhi, India.
Recktenwald, G., 2000, "Numerical Methods with Matlab - Implementation and Application," Prentice Hall, Upper
Saddle River, N.J., U.S.A.
Rothenberg, R.I., 1991, "Probability and Statistics," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego, CA.
Sagan, H., 1961,"Boundary and Eigenvalue Problems in Mathematical Physics," Dover Publications, Inc., New York.
Spanos, A., 1999,"Probability Theory and Statistical Inference - Econometric Modeling with Observational Data,"
Cambridge University Press, Cambridge CB2 2RU, U.K.
Spiegel, M. R., 1971 (second printing, 1999), "Schaum's Outline of Theory and Problems of Advanced Mathematics for
Engineers and Scientists," Schaum's Outline Series, McGraw-Hill, New York.
Tanis, E.A., 1987, "Statistics II - Estimation and Tests of Hypotheses," Harcourt Brace Jovanovich College Outline
Series, Harcourt Brace Jovanovich, Publishers, Fort Worth, TX.
Tinker, M. and R. Lambourne, 2000, "Further Mathematics for the Physical Sciences," John Wiley & Sons, LTD.,
Chichester, U.K.
Tolstov, G.P., 1962, "Fourier Series," (Translated from the Russian by R. A. Silverman), Dover Publications, New York.
Tveito, A. and R. Winther, 1998, "Introduction to Partial Differential Equations - A Computational Approach," Texts in
Applied Mathematics 29, Springer, New York.
Urroz, G., 2000, "Science and Engineering Mathematics with the HP 49 G - Volumes I & II", www.greatunpublished.com,
Charleston, S.C.
Urroz, G., 2001, "Applied Engineering Mathematics with Maple", www.greatunpublished.com, Charleston, S.C.
Winnick, J., , "Chemical Engineering Thermodynamics - An Introduction to Thermodynamics for Undergraduate
Engineering Students," John Wiley & Sons, Inc., New York.

Probability Distributions Wiht SCILAB

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability Distributions Wiht SCILAB

Загружено:

Авторское право:

Доступные форматы

Probability Distributions with SCILAB

Download at InfoClearinghouse.com 8 2001 Gilberto E. Urroz

Download at InfoClearinghouse.com 16 2001 Gilberto E. Urroz

Download at InfoClearinghouse.com 28 2001 Gilberto E. Urroz

Download at InfoClearinghouse.com 30 2001 Gilberto E. Urroz

is the normal score for .]

, such that P(Z>z

) > 1- . Similar definitions are used

Вам также может понравиться