Вы находитесь на странице: 1из 25

Manual to the library of fit tools

Wiebe R. Pestman

30 August 2005
Foreword

The library of fit tools that is offered here works under the mathematical program ‘Scilab’. The
library is free in the sense that you are free to use it, free to change it and free to redistribute it.
However:

Please take notice of the fact that the


library of fit tools that is offered here
comes with absolutely no warranty!

For suggestions, comments or special wishes as to the library, please contact the author at
w.pestman@bio.uu.nl
Contents

1 Installation

1.1 Installation of Scilab.


1.2 The Scilab working directory.
1.3 Installation of the library of fit tools under Scilab.

2 How to run the fit tools.

2.1 How to run the fit tool fitgam.


2.2 How to run the fit tool fitbet.
2.3 How to run the fit tool fitbpr.
2.4 How to run the fit tool fitweibull.
2.5 How to run the fit tool fitnor.
2.6 How to run the fit tool fitlognor.
2.7 How to run the fit tool fitMIXnor.
2.8 How to run the fit tool fitMIXgam.

3 How to use the plot utilities in the fit library.

3.1 Making histograms with Scilab.


3.2 The plot functions plotgam, plotbet, plotbpr, ....
3.3 The plot functions plotMIXnor and plotMIXgam.

4 How to get your data into Scilab and save your work.

4.1 Data that is organized in just one column.


4.2 Data that is organized in several columns of the equal length.
4.3 Data that is organized in several columns of of unequal length.
4.4 Saving your Scilab work.
1 Installation

1.1 Installation of Scilab

Step 1 First download an installer for Scilab from

http://scilabsoft.inria.fr

There are versions available for various flavours of Linux, BSD, MacOS and Windows. For
Windows a binary version of an installer will probably be preferable to most users.

Step 2 Install the program on your machine. Might it occur that things seem to turn against you
during the installation (which is not likely to happen), then realize that installing from source
code is also possibility!

Detailed information about Scilab can be found for example in [1], [3], [4]. A lot of free docu-
mentation can be found by just visiting the Scilab home page at http://scilabsoft.inria.fr/.

1.2 The Scilab working directory

Once installed you can start Scilab. To figure out in which map Scilab starts, the so-called ‘present
working directory’ at startup, pass the command

pwd

Scilab then returns the path to this important map. When Scilab was installed under Windows-XP
this path is by default something like

C:\Documents and Settings\username\Desktop

On a Linux machine it will be something like

/home/username

The map in which Scilab starts is important in daily use. You will frequently use it when running
the program. It is therefore important to let Scilab start in a convenient map on a convenient place
in the directory tree. On a Windows machine, for example, there could be a wish to have Scilab
started in a map with path

C:\Documents and Settings\username\scilab\work

To bring this about, first create the maps ‘scilab’ and ‘work’ in the hierarchy shown in the path
above. Then right-click with your mouse on the Scilab icon on your computer’s desktop. A panel
like the one below will then appear:

1
Choose ‘Properties’ and you will see the next panel appear:

In the field titled ‘Start in’, put in the path to your intended Scilab working directory. Then click
the button ‘OK’. Now the next time Scilab will start in the map you specified above.
The equivalent of the above on a Linux machine is to have Scilab started in a map

/home/username/scilab/work

2
In Linux there are many ways to let Scilab start in such a map. Perhaps the most easy way, when
you are running a recent KDE desktop, is to just mimick the Windows procedure described above.
Another way is to change directory to the desired map in a terminal program and then pass there
the command

scilab

Scilab will then start in the map in question.

1.3 Installation of the library of fit tools under Scilab

How to get the map with fit tools working as a library in Scilab? Just carry out the next two steps
religiously:

Step 1 Copy the map ‘fit’ to the Scilab working directory.

Step 2 To generate a library, pass the next command

genlib("fittools","fit")

By this action a library named ‘fit’ is created and loaded into Scilab, that is to say, ready for
use. The catalog of the library is named ‘fittools’.

The next time you start Scilab it is not necessary to create the library again (it was already created
in Step 2). The existing library can now be loaded into Scilab like this:

load fit/lib

By passing the command

fittools

Scilab returns, as a kind of a catalog, a list of the tools that are available in the library.

3
2 How to run the fit tools

2.1 How to run the fit tool fitgam

In this section it is explained how to use the fit tool fitgam. The function fitgam fits a gamma
distribution to empirical data according to the maximum likelihood principle (for details of this
principle, see for example [2]). The probability density of a gamma distribution is of the form:

ba a−1 −bt
f (t) = t e for t > 0
Γ(a)

In this expression a is called the shape parameter and b the scale parameter, both parameters must
be positive. The parameters a and b are to be adapted to your data such as to have a maximal fit.
How to bring this about? First, of course, there must be some data available to work with. To get
some artificial data into Scilab, pass the next command:

x=grand(500,1,"gam",5,2);

By this action Scilab generates a sample of size 500 from a gamma distributed population with
parameters a = 5 and b = 2, just by simulation. Be sure the fit library is loaded and then try to
retrieve your gamma population parameters by passing the command

z=fitgam(x,"verbose")

Compare the output to the ‘population’ parameters you have set in creating the variable x. Also
try the above in the non-verbose mode of the function fitgam:

p=fitgam(x)

As a second experiment, try the command

x=grand(600,10,"gam",5,2);

to get 10 samples of size 600 from a gamma distributed population with parameters a = 5 and
b = 2 loaded into Scilab. The samples are loaded into Scilab as a matrix x with 600 rows and 10
columns. Again, try to retrieve your parameters by passing the command

p=fitgam(x)

The function fitgam then fits the matrix x columnwise. The results are stored in a matrix p that is
shown on the screen. There is no verbose mode if the input x is a matrix.
Next, try another experiment to learn something more about the function fitgam. First create three
data vectors u, v, w like so:

4
u=grand(500,1,"gam",5,2);
v=grand(600,1,"gam",7,2);
w=grand(400,1,"gam",4,3);

Note that the vectors u, v, w don’t have the same size. For this reason they cannot be stored in a
matrix. However, they can be stored in a list x like this

x=list(u,v,w);

Next, run the function fitgam on the list x by passing the command

p=fitgam(x)

Given some data vector x with n entries, how to get interval estimates for your parameters? This
can be done by means of bootstrapping. First create from x a so-called bootstrap matrix X with,
say, 500 columns:

X=bootstrap(x,500)

Now each column of X is a resampled version of x. So every column of X was obtained by ran-
domly drawing n elements (with replacement) from x. Apply the function fitgam to the bootstrap
matrix X like so

p=fitgam(X);

A matrix p with 500 rows and 4 columns is then returned. To obtain (for example) an interval
estimate for the shape parameter, store the first column from p in a vector like this

aboot=p(:,1)

Next determine the 5th percentile a1 and the 95th percentile a2 of the vector aboot like so

a1=quantile(aboot,0.05);
a2=quantile(aboot,0.95);

Now, roughly, the interval (a1 , a2 ) can be regarded as a 90% confidence interval for the shape
parameter in the gamma fit. A confidence interval for the scale parameter can be obtained in a
similar way.

If you want to run the function fitgam on your own data then you’ll have to get your data into
Scilab first, either as a vector, a matrix or a list. Read Section 4 if you don’t know how to do this.

5
2.2 How to run the fit tool fitbet

In this section it is explained how to use the fit tool fitbet. The function fitbet fits a beta distribution
to empirical data according to the maximum likelihood principle (for details of this principle, see
for example [2]). The probability density of a beta distribution is of the form:
Γ(a + b) a−1
f (t) = t (1 − t)b−1 for 0 < t < 1
Γ(a)Γ(b)
In this expression a will be referred to as the left parameter and b the right parameter, both param-
eters must be positive. The parameters a and b are to be adapted to your data such as to have a
maximal fit. How to bring this about? First, of course, there must be some data available to work
with. To get some artificial data into Scilab, pass the next command:

x=grand(500,1,"bet",5,2);

By this action Scilab generates a sample of size 500 from a gamma distributed population with
parameters a = 5 and b = 2. Be sure the fit library is loaded and then try to retrieve your beta
population parameters by passing the command

p=fitbet(x,"verbose")

For further reference, see the previous section about the function fitgam.

2.3 How to run the fit tool fitbpr

In this section it is explained how to use the fit tool fitbpr. The function fitbpr fits a beta prime
distribution to empirical data according to the maximum likelihood principle (for details of this
principle, see for example [2]). The probability density of a beta prime distribution is of the form:

Γ(a + b) ta−1
f (t) = for t > 0
Γ(a)Γ(b) (1 + t)a+b
In this expression a will be referred to as the left parameter and b the right parameter, both param-
eters must be positive. The parameters a and b are to be adapted to your data such as to have a
maximal fit. First, of course, there must be some data available to work with. To get some artificial
data into Scilab, pass the next command:

x=grand(500,1,"bet",5,2);

By this action Scilab generates a sample of size 500 from a beta distributed population with param-
eters a = 5 and b = 2. To turn this into a sample from a beta prime distribution with parameters
a = 5 and b = 2, transform x as

x=x./(1-x);

6
Now try to retrieve your beta prime population parameters by passing the command

p=fitbpr(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

2.4 How to run the fit tool fitweibull

In this section it is explained how to use the fit tool fitweibull. The function fitweibull fits a Weibull
distribution to empirical data according to the maximum likelihood principle (for details of this
principle, see for example [2]). The probability density of a Weibull distribution is of the form:
b
f (t) = ab tb−1 e−at for t > 0

In this expression a will be referred to as the first parameter and b the second parameter, both
parameters must be positive. The parameters a and b are to be adapted to your data such as to
have a maximal fit. First, of course, there must be some data available to work with. To get some
artificial data into Scilab, pass the next command:

x=randweibull(500,1,[4,2]);

By this action Scilab generates a sample of size 500 from a Weibull distributed population with
parameters a = 4 and b = 2. Now try to retrieve your Weibull population parameters by passing
the command

p=fitweibull(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

2.5 How to run the fit tool fitnor

In this section it is explained how to use the fit tool fitnor. The function fitnor fits a normal
distribution to empirical data according to the principle of maximum likelihood (for details, see
for example [2]). The probability density of a normal distribution is of the form:
1 1 2 2
f (t) = √ e− 2 (t−µ) /σ
σ 2π
In this expression µ will be referred to as the mean and σ the standard deviation, the latter param-
eter must be positive. The parameters µ and σ are to be adapted to your data such as to have a
maximal fit. First, of course, there must be some data available to work with. To get some artificial
data into Scilab, pass the next command:

x=grand(500,1,"nor",75,10)

7
By this action Scilab generates a sample of size 500 from a normally distributed population with
parameters µ = 75 and σ = 10. Now try to retrieve your normal population parameters by passing
the command

p=fitnor(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

2.6 How to run the fit tool fitlognor

In this section it is explained how to use the fit tool fitlognor. The function fitlognor fits a lognormal
distribution to empirical data according to the principle of maximum likelihood (for details of this
principle, see for example [2]). A random variable X has a lognormal distribution with parameters
m and s if and only if the variable log(X) has a normal distribution with mean a and standard
deviation b. The probability density of a lognormal distribution is of the form:
1 1 2}
f (t) = √ e− 2 {(log(t)−a)/b) for t > 0
tb 2π
In this expression a will be referred to as the first and b as the second parameter, the latter parameter
must be positive. The parameters a and b are to be adapted to your data such as to have a maximal
fit. First, of course, there must be some data available to work with. To get some artificial data
into Scilab, pass the next command:

x=grand(500,1,"nor",7,1)

By this action Scilab generates a sample of size 500 from a normally distributed population with
parameters a = 7 and b = 1. To turn this into a sample from a lognormal distribution with
parameters a = 7 and b = 1, transform x as

x=exp(x);

Now try to retrieve your lognormal population parameters by passing the command

p=fitlognor(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

8
2.7 How to run the fit tool fitMIXnor

In this section it is explained how to use the fit tool fitMIXnor. The function fitMIXnor fits a
mixture of two normal distributions to empirical data according to the principle of maximum
likelihood (for details of this principle, see for example [2]). The probability density of a mixture
of two normal distributions is of the form:

f (t) = π ϕ(t|µ1 , σ1 ) + (1 − π) ϕ(t|µ2 , σ2 )

where
1 1 2 2
ϕ(t|µ, σ) = √ e− 2 (t−µ) /σ
σ 2π
In the expression for f the parameters µ1 and µ2 will be referred to as the means, the parameters
σ1 and σ2 as the standard deviations in the mixture (the latter parameters must be positive). The
normal distributions N (µ1 , σ1 ) and N (µ2 , σ2 ) are called the first and second component in the
mixture respectively. Finally, there is the parameter π which is a number in between 0 and 1.
This parameter will be called the participation fraction of the first component. The parameters
µ1 , µ2 , σ1 , σ2 and π are to be adapted to your data such as to have a maximal fit. First, of course,
there must be some data available to work with. To get some artificial data into Scilab, pass the
next command:

x=randMIXnor(500,1,[7,2,12,1,0.3]);

to get a sample of size 500 from a mix of a N (7, 2)-distribution and a N (12, 1)-distribution (with
a participation fraction of 0.3 of the first) loaded into Scilab. Try to retrieve your parameters by
passing the command

p=fitMIXnor(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

2.8 How to run the fit tool fitMIXgam

In this section it is explained how to use the fit tool fitMIXgam. The function fitMIXgam fits
a mixture of two gamma distributions to empirical data according to the maximum likelihood
principle (for details of this principle, see for example [2]). The probability density of a mixture
of two gamma distributions is of the form:

f (t) = π ϕ(t|a1 , b1 ) + (1 − π) ϕ(t|a2 , b2 ) for t > 0

where
ba a−1 −bt
ϕ(t|a, b) = t e for t > 0
Γ(a)

9
In the expression for f , the parameters a1 and a2 will be referred to as the shape parameters,
σ1 and σ2 as the scale parameters in the mixture (all parameters must be positive). The gamma
distributions Γ(a1 , b1 ) and Γ(a2 , b2 ) are called the first and second component in the mixture re-
spectively. Finally, there is the parameter π which is a number in between 0 and 1. This parameter
will be called the participation fraction of the first component. The parameters a1 , a2 , b1 , b2 and
π are to be adapted to your data such as to have a maximal fit. First, of course, there must be some
data available to work with. To get some artificial data into Scilab, pass the next command:

x=randMIXgam(1000,1,[7,3,2,2,0.3]);

to get a sample of size 1000 from a mix of a Γ(7, 3)-distribution and a Γ(2, 2)-distribution (with
a participation fraction of 0.3 of the first) loaded into Scilab. Try to retrieve your parameters by
passing the command

p=fitMIXgam(x,"verbose")

For further reference, see Section 2.1 about the function fitgam.

10
3 How to use the plot utilities in the fit library

3.1 Making histograms with Scilab

In Scilab there is the comfortable function ‘histplot’ to make histograms of your data. To illustrate
the use of this function, generate some artificial data first (just to have something to work with):

x=grand(400,1,"nor",175,10);

A sample of length 400 from a normally distributed population with mean 175 and standard devi-
ation 10 is now generated and stored in a variable named x. To make a histogram with 15 bars of
this artificial sample, just pass the command

histplot(15,x)

A second window will then appear with the desired histogram in it. It is also possible to specify
the bints in the histogram. To bring this about, store the bints in a vector (that could be named for
example bints) like so

bints=[140,155,165,175,185,200];

Then pass the command

histplot(bints,x)

To learn some more about the graphical features in Scilab, just play around a bit with the buttons
‘Zoom’ and ‘Edit’. Also try to export your graph, for example as an eps-file, by following the
menu path

File −→ Export

You will learn then how to save your graphs in several graphical formats.

3.2 The plot functions plotgam, plotbet, plotbpr, ....

The functions plotgam, plotbet, plotbpr, plotweibull, plotnor, plotlognor can be used to quickly
get a graph of the density functions of the associated probability distributions. They all work in
the same way and for that reason it is in this section only explained how to run the function fitgam.
Well, to get a plot of a gamma density with shape parameter 7 and scale parameter 2, store the
values of these parameters in a vector

p=[7,2]

11
Then pass the command

plotgam(p)

and you will have a graph of the density in question, as sketched below:

0.36

0.32

0.28

0.24

0.20

0.16

0.12

0.08

0.04

0.00
0 1 2 3 4 5 6 7 8 9 10

If you want to customize the endpoints of the axes in the graph, then close the old graphic window
and pass a command of this type

plotgam(p,[1,14],[0,0.4])

The horizontal axis will then start at 1 and end at 14, the vertical axis will start at 0 and end at 0.4.

To learn some more about the graphical features, get some artificial data into Scilab by generating
a sample x from a gamma distributed population with shape 7 and scale 2 like this

x=grand(400,1,"gam",7,2);

Make a histogram with 20 bars of your sample x like so

histplot(20,x)

Then try to extract the population parameters from the sample by passing the command

12
p=fitgam(x)

Now superpose the fitted density on your histogram graph:

plotgam(p)

The result will be a histogram together with the fitted density in one and the same graph, as
sketched below

0.4

0.3

0.2

0.1

0.0
1 2 3 4 5 6 7 8 9

Be aware that your result is likely to differ somewhat from the above. This is due to the random
way the data vector x was created. The figure can, to a certain extent, be edited by clicking the
button ‘Edit’ and then activate the so-called ‘Entity picker’. After having activated this function,
try it out by mouse clicking for example on the curve in the graph.

3.3 The plot functions plotMIXnor and plotMIXgam

The functions plotMIXnor and plotMIXgam can be used to quickly get a graph of the density
functions of the associated probability distributions. They can be run in the same way and for that
reason it is in this section only explained how to run the function plotMIXnor. To get an example
of a plot of a mixture of two normal distributions, create some parameter vector p by

p=[7,1,12,2,0.4,0.6]

13
This parameter vector defines a mixture of two normal distributions in which component 1 has
mean 7, standard deviation 1 and participation fraction 0.4 and in which component 2 has mean
12, standard deviation 2 and participation fraction 0.6. Now pass the command

plotMIXnor(p,1)

and you will have a graph of the density in question. If you are interested in the two components,
then close the old window and pass the command

plotMIXnor(p,2)

The mixture together with the two components in one and the same graph can be obtained like this
(don’t forget to first close the old graphic window):

plotMIXnor(p,3)

If you want to customize the endpoints of the axes in the graph, then pass a command of this type

plotMIXnor(p,3,[1,14],[0,0.4])

The endpoints of the axes will then be set to your wishes.

The density plots can be superposed on a histogram plot. To illustrate this, generate some artificial
data like this:

x=randMIXnor(600,1,p);

Create a histogram with 20 bars like so

histplot(20,x)

Next, fit a mixture of two normal distributions to the data by passing the command

pfit=fitMIXnor(x)

To get the fitted density in your histogram, pass the command

plotMIXnor(pfit,1)

The result will be a histogram together with the fitted density in one and the same graph, as
sketched below:

14
0.1900

0.1689

0.1478

0.1267

0.1056

0.0844

0.0633

0.0422

0.0211

0.0000
4.00 6.14 8.29 10.43 12.57 14.71 16.86 19.00

Note that your result will differ somewhat from the above, because your data (as it was randomly
generated) will not be exactly the same as that used to make the figure above.

15
4 How to get your data into Scilab and save your work

4.1 Data that is organized in just one column

This a boring subject, but if you like to use Scilab then it is absolutely neccessary to know how
you can get your data into this program. Let us sketch an easy way how to perform this. Suppose
you have some sequence of measurements at your disposition. To get them into Scilab, first store
the data in a so-called text file. This can be done by simply typing them in via a text editor, choose
for example the text editor WordPad on a Windows machine or choose Kate or Emacs on a Linux
machine. If you have your data stored in the form of a spreadsheet, for example an xls-file, then
there is a possibility to export it as a text file. You may then get a text file that looks like the one
below when opened in a text editor (in this case the editor Kate):

The text file above was named ‘example1.txt’ and it contains only 7 numbers. To get these data
into Scilab, proceed as follows:

Step 1 Copy the file (in this case the file ‘example1.txt’) to the working directory of Scilab.

Step 2 Pass the command

x=read("example1.txt",4,1)

16
Scilab then stores the first 4 numbers in a vector x. You can store all numbers in x like this

x=read("example1.txt",7,1)

or, if you don’t know how many numbers there are, like this:

x=read("example1.txt",-1,1)

Don’t put headers in your data text files; you’ll get in trouble then when applying the above. For
example the above won’t work if you modify the file ‘example1.txt’ into

17
4.2 Data that is organized in several columns of the equal length

Your data might of course be organized in columns, as in the text file ‘example2.txt’ below (opened
in the text editor Kate):

Then proceed as follows to get it into Scilab:

Step 1 Copy the file (in this case the file ‘example2.txt’) to the working directory of Scilab.

Step 2 Pass the command

x=read("example2.txt",4,2)

Scilab then stores the first 4 rows from the first 2 columns in a matrix x. You can store all
numbers in x like this

x=read("example2.txt",7,3)

or, if you don’t know how many rows there are, like this:

x=read("example2.txt",-1,3)

18
4.3 Data that is organized in several columns of of unequal length

A complication comes across when the columns in your dataset do not mutually have the same
length, for example, as in the text file ‘example3.txt’ below:

It is then impossible to store the data in a matrix in Scilab. What you could do here is the following:

Step 1 Save the columns in separate text files, say, the files ‘sequence1.txt’ and ‘sequence2.txt’.

Step 2 Copy these files to the Scilab working directory.

Step 3 Load, as far as this has not already been done, the library of fit tools into Scilab.

Step 4 Pass the command

x=mklist("sequence",2,"txt")

Scilab then stores the two columns in a so-called ‘list’, a list named x. It is just another
formula to store your data in Scilab.

19
4.4 Saving your Scilab work

When running Scilab you will likely arrive at a point where you want to save your work. In Scilab
there are basically two ways to do this:

• Saving work to text files.

• Saving work to binary files.

To illustrate the first option, create some 6×2-matrix x, for example like this

x=grand(6,2,"nor",10,2)

or like so

x=[1,2;3,4;5,6]

The entries of the matrix you created above can now be written to a text file named ‘mymatrix.txt’
by passing the command

write("mymatrix.txt",x)

The text file ‘mymatrix.txt’ is then created in the Scilab working directory. You can look it up in
your Scilab working directory and open it with a text editor to see the result. Next time you start
Scilab you can get your matrix x again into Scilab like so

x=read("mymatrix.txt",6,2)

or like so

x=read("mymatrix.txt",-1,2)

This illustrates how work can be saved in the form of text files. Besides this method there is in
Scilab a possibility to save work in the form of so-called binary files. The content of such files
cannot be viewed in text editors, but their content can, of course, be (re)loaded into Scilab. To save
your matrix x in binary form, proceed as follows

save("mymatrix.bin",x)

The binary file ‘mymatrix.bin’ is then created in the Scilab working directory. You may look it up
and try to open it with a text editor: failure will be the result! However, the next time you start
Scilab the content of the file can be reloaded. To bring this about, pass the command

load mymatrix.bin

20
By this action the matrix x will be in the Scilab work space again. To convince yourself, just type

and strike the key ‘Enter’ and there is your matrix again. It is possible to store more than one
variable in a binary file. To illustrate this, create for example the transposed matrix y of x like so

y=x’

Next, save both x and y in the binary file ‘mymatrices.bin’ like this

save("mymatrices.bin",x,y)

Next time you start Scilab you can get your variables x and y into the work space by typing

load mymatrices.bin

In the same way you can store three or more variables in one binary file and reload them in a
subsequent Scilab session.

21
References

1. Chancelier, J.P. et al, Introduction à SCILAB. (Springer Verlag, Berlin 2002)

2. Pestman, W.R., Mathematical Statistics. (Walter de Gruyter Verlag, Berlin 1998)

3. Pinçon, B., Une introduction à SCILAB. (www.iecn.u-nancy.fr/∼pincon/scilab/scilab.html)

4. Urroz, G.E., Numerical and Statistical Methods with SCILAB for Science and Engineering.
(www.greatunpublished.com)

As to Scilab, a lot of free documentation is available in various languages. Visit the Scilab home
page http://scilabsoft.inria.fr/ to get an impression.

22