Академический Документы
Профессиональный Документы
Культура Документы
C ((2011))
Estimate Survival Function for the Brain Cancer Disease by Using Three
Parameters Weibull Distribution
Iden Hassan Al Kanani
And
iden_alkanani@yahoo.com
Baghdad University
College of sciences for women
Abstract
In this research deriving and finding the survival probability function , hazard
probability function and failure (death) probability function for three parameters Weibull
distribution of the brain cancer in complete data. To satisfying that estimating all parameters
(shape, scale and location) using two classical methods (MLE method and OLS method) to
find the survival function. Depending on sample from real data which describe the duration of
survivor for patients who suffer from the brain cancer based on diagnosis of disease or the
enter of patients in a hospital for period 1/1/2009 until 31/12/2009 .Calculating and estimating
all previous probabilities functions , then comparing the numerical results by using the mean
squares error measure to survival probability function, reaching that the survival function for
the brain cancer by using maximum likelihood method MLE is the best.
Introduction
The medical experiment is most important experiment which related with human, thus the
cancer disease is one of the diseases which proposed the health of human and the brain cancer
disease is lethal cancer which killed the human .This study concerned to get real data for
failure (death) times for this disease which distributed as Weibull three parameters
distribution for complete data. Furthermore, estimating all parameters in Weibull distribution
by using two classical methods(maximum likelihood estimator method and ordinary least
squares estimator method) depending upon iterative method (Newton-Raphson method), then
utilized these estimating parameters to estimate the probability of death density function,
probability of cumulative distribution function , probability of survival function and
probability of hazard function. At last comparing the probability of survival function by using
the measure of the mean squares error .
Maximum Likelihood Estimator Method
The maximum likelihood method (ML) is one of the most popular and reliable
method to obtaining a point estimator of parameters in any distribution . It copes with all
types of samples, whether uncensored or censored in one way or another and whether the
data are grouped or not .MLE has many properties which are as follows :
1. MLEs are consistent.
2. MLEs are asymptotically unbiased, although they may be biased in finite samples.
The probability density function of the three parameters Weibull distribution is:
Where
are the shape parameter, location parameter and scale parameter respectively.
The likelihood function of three parameters Weibull distribution is:
Taking the logarithm for the likelihood function , so we get the function:
The partial derivatives for the log likelihood function with respect to unknown parameters
, and
are:
We place the partial derivative for
to zero as follows:
from
and
Thus , derivative
with respect to
zero hence solve the equation :
109
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
Since we
cannot find the estimators for the parameters
because of complexly of solving this
nonlinear equations simultaneously , so we resort to iterative methods in numerical
analysis .Consider Newton Raphson method is one of the best iterative methods in
numerical analysis because it's very fast and the error of this iterative method is quadratic
approximation. An iterative procedure is a technique of successive approximations, and each
approximation is called iteration. If the successive approximations approaches the solutions
vary closely then the iterations converge. The Newton Raphson method requires an initial
value of each unknown parameters
The steps of this method are as follows:
Since
110
leads to
Where
is the empirical cumulative distribution function , it can be found by the
symmetrical CDF method :
Where
is the total number of sample. The three functions
are
the first derivative of log likelihood function with respect to unknown parameters
respectively .
Then:
The Jacobean matrix in maximum likelihood method estimator must be a non singular
symmetric matrix in this procedure because depending upon the first derivatives , so its
inverse can be founded .The absolute value for the difference between the new
111
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
founded values with the initial value is the error term , it must be a symbol by
which is a very small value and assumed .
Then, error term is formulated as:
So,
By taking the sum of equation (38) after that squared the two sides to reach:
Now, deriving the equation (39) for the unknown parameters
functions are respectively:
112
we get three
This functions are system of nonlinear equations and can't solve it simultaneously , So we can
solve it by Newton Raphson method iterative such as in MLE method by applied the
equation (15) .The initial values for unknown parameters
in this method are
assumed. The Jacobean matrix
in equation (43) is the first derivatives for each function
of
with respect to unknown parameters
.
Then:
Also, the Jacobean matrix here is non singular and symmetric matrix because finding it
depending upon the first derivatives. Now, by applied the equation (14) we get the estimators
of the three parameters Weibull distribution by ordinary least squares method.
Description of Data
In this research, depending on real data for the brain cancer diseases , choosing this
type of cancer because it is diffusion and deadly in current time in Iraq to collect data
for the brain cancer diseases , returning the atomic medicine and radiance hospital in
Baghdad .The time of study point in this research determined from 11 2009 until 3112
2009 , that means the duration time of this study is constant and fixed for (365) days .
The number of patients in the experiment for the above duration time is (87) , but
noting that (50) patients were entered to the study and (37) patients left the hospital
and any follow up could not be done for them , but all (50) patients were dead during
113
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
the time of study , that means the data became complete data. The data was represented as
follows: 11, 11, 15 ,16 ,17, 17 ,25 ,29, 31, 31,32, 34, 35,45 ,46, 46 ,52, 61, 62, 63, 63, 64, 64,
66, 66, 69, 72, 72, 74, 75, 77, 80, 81, 82, 84, 84 ,89 ,93 ,96 ,98 ,112 ,117 ,142 ,143 ,147 ,152
,166 ,209 ,223 ,274.
Goodness of Fit
we want to know if the survival time data is distributed as Weibull distribution under
consideration , then using goodness of fit test .The null and alternative hypotheses are as
follows:
The survival time data is distributed as Weibull
.
The survival time data is not distributed as Weibull
.
In this test , calculating the value of chi squared test which formulated as :
Where:
: is observed frequency for class . , : is expected frequency for class .
: is the total number of classes.
The degree of freedom is
,where represents the number of parameters in
distribution .By using the statistical programming " stat graph " finding that the calculate
value of
equals to (10.03) , when comparing it with tabulated value of
under level
of significant (0.05) and degree of freedom (7) , it equals to (14.07) . The calculate
value of
is less than the tabulated value of
, that means accepting the null
, which mean that the survival time data distributed as Weibull
hypothesis
distribution .
Estimation the Parameters by MLE Method
To find the estimated values for shape parameter , scale parameter and location
parameter in Weibull distribution , Newton Raphson method was used which is one of
the numerical analysis methods . This method needs to initial values to solve the three
nonlinear equations which were getting from the derivation of the log likelihood
function that numbered as
according to the three unknown parameters in
Weibull distribution .The ordinary least squares method was used to get and calculate the
initial values for scale and shape parameters which is shown in
), the symmetrical
to get the value of
cumulative distribution function formula was applied in equation
empirical cumulative distribution function
which can be seen in
.The estimated
in simple linear regression model using the least squares
value of parameters
method is shown below :
,
1.625
Now , by using the relationships which is defined in equation
, to get the initial
values for shape and scale parameters in Weibull distribution respectively
are :
,
To find the initial value of location parameter
, in the beginning was decided to
applied the equation (11) , but during the practice , the results clarified , that means it
cannot use the equation (11) because of getting invalid results, thus the negative number
under fraction root which its logarithm does not give result for the logarithm of
fraction . Finally , with many trials , deciding that the initial value for location
parameter must be assumed . Then depending on the fact that can use any number
which is smaller than the smallest value of failure time
, that means it must be
less than 11 (
).
114
After that , applying the maximum likelihood estimator method for estimating the three
unknown parameters of Weibull distribution under complete data .To estimate these
unknown parameters , using Newton Raphson method to solve the equations (19), (20),
(21) with admissible bound error (0.000000001) depending on the matrices equation in (14)
.Therefore , trying many numbers for location parameter
to choose and prefer one of
them , depending on the number of iteration and the value of error , then choosing any
number for location parameter that giving small number of iteration and the smallest value
of error. The result which satisfying all these conditions is as follows :
Table (1): Estimated values for the parameters by MLE method
Estimate values
Number of iteration
7
Now , these estimated values for three parameters in Weibull distribution using them to
, cumulative
find numerical values for probability death density function
distribution function
, survival function
and hazard function
.The results
for these four functions are shown in the following table :
Table (2): Estimated values for the functions
11
11
15
16
17
17
25
29
31
31
32
34
35
45
46
46
52
61
62
63
63
64
64
66
66
69
72
72
74
75
77
80
81
82
84
84
0.00706928
0.00706928
0.00956421
0.00980940
0.01000067
0.01000067
0.01054641
0.01048753
0.01041379
0.01041379
0.01036828
0.01026262
0.01020335
0.00945021
0.00936399
0.00936399
0.00882451
0.00798125
0.00788713
0.00779313
0.00779313
0.00769931
0.00769931
0.00751239
0.00751239
0.00723441
0.00696015
0.00696015
0.00677979
0.00669042
0.00651344
0.00625262
0.00616700
0.00608206
0.00591427
0.00591427
0.004685129
0.004685129
0.039343360
0.049035370
0.058944279
0.058944279
0.141975729
0.184088938
0.204994368
0.204994368
0.215385848
0.236019815
0.246253139
0.344720969
0.354128187
0.354128187
0.408709740
0.484348381
0.492282568
0.500122687
0.500122687
0.507868894
0.507868894
0.523080423
0.523080423
0.545199816
0.566490639
0.566490639
0.580230238
0.586965300
0.600168770
0.619316431
0.625526192
0.631650668
0.643646529
0.643646529
,F
0.99531487
0.99531487
0.96065663
0.95096462
0.94105572
0.94105572
0.85802427
0.81591106
0.79500563
0.79500563
0.78461415
0.76398018
0.75374686
0.65527903
0.64587181
0.64587181
0.59129025
0.51565161
0.50771743
0.49987731
0.49987731
0.49213110
0.49213110
0.47691957
0.47691957
0.45480018
0.43350936
0.43350936
0.41976976
0.41303469
0.39983122
0.38068356
0.37447380
0.36834933
0.35635347
0.35635347
115
0.00710255
0.00710255
0.00995591
0.01031521
0.01062708
0.01062708
0.01229150
0.01285377
0.01309901
0.01309901
0.01321450
0.01343311
0.01353685
0.01442166
0.01449822
0.01449822
0.01492417
0.01547800
0.01553449
0.01559009
0.01559009
0.01564484
0.01564484
0.01575190
0.01575190
0.01590679
0.01605538
0.01605538
0.01615121
0.01619821
0.01629047
0.01642474
0.01646845
0.01651166
0.01659664
0.01659664
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
89
93
96
98
112
117
142
143
147
152
166
209
223
274
0.00550753
0.00519579
0.00497018
0.00482372
0.00388739
0.00359008
0.00237198
0.00233177
0.00217685
0.00199600
0.00155873
0.00070404
0.00053815
0.00019537
0.672193316
0.693595851
0.708843043
0.718636422
0.779433476
0.798119226
0.871755940
0.874107772
0.883121916
0.893548275
0.918318435
0.964748311
0.973394981
0.990720966
0.32780668
0.30640414
0.29115695
0.28136357
0.22056652
0.20188077
0.12824405
0.12589222
0.11687808
0.10645172
0.08168156
0.03525168
0.02660501
0.00927903
0.01680116
0.01695732
0.0170704
0.01714410
0.01762459
0.01778318
0.01849587
0.01852201
0.01862498
0.01875031
0.01908305
0.01997202
0.02022755
0.02105588
Note that we can make the following comments for the results in the above table :
1. The differences between the values of all death density function
are very small and it
converges between them. Noting that the values of death density function were
then the values became decreasing slightly until the
increasing slightly until
end of failure times . The patients who remain in a hospital for
days had
smallest probability of death with( 0.00019537) while the patients who remain in a
hospital for
days had largest probability of death with (0.01054641) .
is increasing
2. Noting that the values of cumulative death distribution function
with increasing failure times because it collects the probability values for all
observations step by step, that means there is a direct relationship between failure times
and cumulative death distribution function .
3. Noting that the values of survival function
are decreasing gradually with increasing
the failure times for the brain cancer patients in the hospital , that means there is an
opposite relationship between failure times and survival function . Showing that the
probability of survival for patients was high when the patients stay alive in the
hospital for the brain cancer was low and visevarse , that means the patient who
remains
days in a hospital had a greatest probability of survival with
(0.99531487) , but the patient who remains
days in a hospital had a
smallest probability of survival with (0.00927903) .
4. Noting that the value of hazard function
are increasing gradually with increasing
the failure times of patients for the brain cancer patients in the hospital, that means
there is a direct relationship between the failure times and hazard function . Showing
that the probability of hazard for patients was dead was low when the patient stay
alive in the a hospital for the brain cancer was low too and visevarse , that means
the patient who remains
days in a hospital had a smallest probability of
hazard for death with (0.00710255) but the patient who remains
days in a
hospital had a largest probability of hazard for death with (0.0210588) .
Finally , it is known that the value of
depends on the shape parameter values
such that
thus , the hazard function is an increasing function as
increasing when
.At last , calculating mean squares error for survival probability
function by using the following formula :
Where:
Is the estimated survival probability function.
116
Is the empirical survival probability function, which can found it in the following
table :
Table (3)
11
11
15
16
17
17
25
29
31
31
32
34
35
45
46
46
52
61
62
63
63
64
64
66
66
69
72
72
74
75
77
80
81
82
84
84
89
93
96
98
112
117
142
143
147
152
166
209
223
274
0.99531487 0.99531487
0.96065663 0.95096462
0.94105572 0.94105572
0.85802427
0.81591106 0.79500563
0.79500563 0.78461415
0.76398018 0.75374686
0.65527903 0.64587181
0.64587181 0.59129025
0.51565161 0.50771743
0.49987731 0.49987731
0.49213110 0.49213110
0.47691957 0.47691957
0.45480018 0.43350936
0.43350936 0.41976976
0.41303469 0.39983122
0.38068356
0.37447380 0.36834933
0.35635347 0.35635347
0.32780668 0.30640414
0.29115695 0.28136357
0.22056652 0.20188077
0.12824405
0.12589222 0.11687808
0.10645172 0.08168156
0.03525168 0.02660501
0.00927903
1
1
0.96
0.94
0.92
0.92
0.88
0.86
0.84
0.84
0.80
0.78
0.76
0.74
0.72
0.72
0.68
0.66
0.64
0.62
0.62
0.58
0.58
0.54
0.54
0.50
0.48
0.48
0.44
0.42
0.40
0.38
0.36
0.34
0.32
0.32
0.28
0.26
0.24
0.22
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00002195 0.00002195
0.00000043 0.00012022
0.00044334 0.00044334
0.00048293 0.00194383
0.00202449 0.00202449
0.00023672 0.00025663
0.00003910 0.00717764
0.00549498 0.00549498
0.00786941 0.02083645
0.01749867 0.01442946
0.01442946 0.00772094
0.00772094 0.00397914
0.00397914 0.00204302
0.00216137 0.00216137
0.00040926 0.00004851
0.00000002 0.00000046
0.00020949
0.00080368 0.00132157
0.00132157 0.00228547
0.00215334 0.00261703
0.00376548 0.00042298
0.00047876 0.00100844
0.00019902 0.00000974
0.00004162 0.00000282
0.00061247 0.00017942
0.00011493
Then the mean squares error for survival probability function which estimates the
parameters in Weibull distribution by using MLE method is :
117
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
In this paper, we shall use ordinary least squares method to estimate the parameters in
Weibull distribution for complete data. First, applying the Newton Raphson method to
estimate the parameters in Weibull distribution, but to apply this method requiring the initial
values for the shape, scale and location parameters, which are assumed. Trying to considered
many sets of cases for initial values of all parameters in Weibull distribution, choosing the set
of initial values which leads to getting estimated values for all parameters in Weibull
distribution, these estimated values must satisfy two conditions in Newton Raphson method
are: Smallest value for error term and smallest number of iteration. Then, the results of
assumed initial values for all parameters are as follows:
,
,
Now, to solve the three nonlinear equations which were getting from the derivations of the
sum squares of error term that numbered as (40) , (41) , (42) according to the three
unknown parameters , then the estimated values for these parameters in Weibull
distribution are tabulated below :
Table (4):Estimated values for the parameters by OLS method
Estimate values
Number of iteration
Errors for all parameters
0.0000000000014
7
0.0000000000075
0.0000000000017
After that, using these estimated values for all parameters in Weibull distribution to find the
, cumulative distribution
numerical values for probability death density function
function
, survival function
and hazard function
. The results of these
four functions are shown in the following table .
Table (5):Estimated values for the functions
11
11
15
16
17
17
25
29
31
31
32
34
35
45
46
46
52
61
62
63
63
64
64
66
66
69
72
72
74
75
0.00090503
0.00090503
0.00060659
0.00056039
0.00052074
0.00052074
0.00033250
0.00028161
0.00026158
0.00026158
0.00025261
0.00023638
0.00022902
0.00017466
0.00017061
0.00017061
0.00014977
0.00012658
0.00012444
0.00012237
0.00012237
0.00012037
0.00012037
0.00011655
0.00011655
0.00011127
0.00010644
0.00010644
0.00010344
0.00010201
0.38262351
0.38262351
0.37967946
0.37909657
0.37855649
0.37855649
0.37525534
0.37403274
0.37349003
0.37349003
0.37323298
0.37274435
0.37251168
0.37051773
0.37034511
0.37034511
0.36938664
0.36814886
0.36802335
0.36789994
0.36789994
0.36777857
0.36777857
0.36754168
0.36754168
0.36720006
0.36687360
0.36687360
0.36666374
0.36656101
0.61737648
0.61737648
0.62032053
0.62090342
0.62144350
0.62144350
0.62474465
0.62596725
0.62650996
0.62650996
0.62676701
0.62725564
0.62748831
0.62948226
0.62965488
0.62965488
0.63061335
0.63185113
0.63197664
0.63210005
0.63210005
0.63222142
0.63222142
0.63245831
0.63245831
0.63279993
0.63312639
0.63312639
0.63333625
0.63343898
118
,
0.00236533
0.00236533
0.00159764
0.00147824
0.00137559
0.00137559
0.00088608
0.00075290
0.00070039
0.00070039
0.00067681
0.00063416
0.00061481
0.00047140
0.00046068
0.00046068
0.00040547
0.00034384
0.00033814
0.00033263
0.00033263
0.00032729
0.00032729
0.00031713
0.00031713
0.00030302
0.00029013
0.00029013
0.00028213
0.00027829
77
80
81
82
84
84
89
93
96
98
112
117
142
143
147
152
166
209
223
274
0.00009926
0.00009539
0.00009417
0.00009298
0.00009069
0.00009069
0.00008542
0.00008163
0.00007900
0.00007734
0.00006741
0.00006446
0.00005287
0.00005250
0.00005104
0.00004933
0.00004509
0.00003568
0.00003341
0.00002712
0.63364023
0.63393214
0.63402693
0.63412051
0.63430417
0.63430417
0.63474422
0.63507823
0.63531916
0.63547549
0.63648567
0.63681527
0.63827251
0.63832520
0.63853226
0.63878315
0.63944324
0.64116426
0.64164759
0.64318020
0.36635976
0.36606785
0.36597306
0.36587948
0.36569582
0.36569582
0.36525577
0.36492176
0.36468083
0.36452450
0.36351432
0.36318472
0.36172748
0.36167479
0.36146773
0.36121684
0.36055675
0.35883573
0.35835240
0.35681979
0.00027093
0.00026060
0.00025733
0.00025414
0.00024800
0.00024800
0.00023388
0.00022370
0.00021664
0.00021217
0.00018546
0.00017750
0.00014618
0.00014515
0.00014121
0.00013656
0.00012507
0.00009944
0.00009324
0.00007601
Note that the results in the above table can make the following comments:
1. The differences between the values of all death density function
are very small and it
converge between them. Noting that the values of death density function were decreasing
slightly until the end of failure times. The patients who remains in a hospital for
days had smallest probability of death with (0.00002712) while the patients
who remains in a hospital for
days had largest probability of death with
(0.00090503).
are increasing with
2. Noting that the values of cumulative death distribution function
increasing failure times because it collects the probability values for all observations step
by step , that means there is a direct relationship between failure times and cumulative
death distribution function .
3. Noting that the values of survival function
are decreasing gradually with increasing
the failure times for the brain cancer patients in the hospital that means there is an
opposite relationship between failure times and survival function. Showing that the
probability of survival for patients in the hospital was high when the patients stay alive in
the hospital for the brain cancer was low and visevarse, that means the patient who
remains
days in a hospital had a greatest probability of survival with
(0.38262351), but the patient who remains
days in a hospital had a smallest
probability of survival with (0.35681979).
4. Noting that the values of hazard function
are decreasing gradually with increasing
the failure times of patients for the brain cancer patients in the hospital, that means there
are an opposite relationship between the failure times and hazard function. Showing that
the probability of hazard for patients was dead in the hospital was high when the
patient stay alive in the a hospital for the brain cancer was low too and visevarse , that
means the patient who remains
days in a hospital had a greatest probability of
hazard for death with (0.00236533) but the patient who remains
days in a
hospital had a smallest probability of hazard for death with (0.00007601) . Finally , it
is known that the value of
depends on the shape parameter values such
thus, the hazard function is a decreasing function as
increasing when
that
.At last, by calculating the mean squares error for survival probability function of
OLS by using the formula in equation(55),where:
is the estimated survival
119
Al Kanani & Jasim:Estimate Survival Function for the Brain Cancer Disease by Using Three
Noting that , the mean squares error for probability of survival function by using MLE
method is less than the OLS method , then preferring it .
Conclusions
The conclusions of this research are given here as follows:
1. Estimation parameters by using MLE method for complete data in three parameters
Weibull distribution.
a. There is a direct relationship between the failure times and the probability
cumulative distribution function .
b. There is a direct relationship between the failure times and the probability hazard
function .
c. There is an opposite relationship between the failure times and the probability survival
function.
d. There is a vibrate relationship between the failure times and the probability death
density function .
e. The survival function is a decreasing function with respect to failure times
increasing and the hazard function is an increasing function with respect to failure
times increasing when
(
.
2. Estimation parameters by using OLS method for complete data in three parameters
Weibull distribution.
a. There is a direct relationship between the failure times and the probability cumulative
distribution function.
b. There is an opposite relationship between the failure times and the probability hazard
function.
c. There is an opposite relationship between the failure times and the probability survival
function.
d. There is a vibrate relationship between the failure times and the probability death
density function.
e. The survival function is a decreasing function with respect to failure times increasing
and the hazard function is a decreasing function with respect to failure times
increasing when
.
Reference
1. Ahamd M.R., Ali A.S. and Assad A.M. (2009), " Estimation Accuracy of Weibull
Distribution Parameters " , Journal of Applied Science Research , Vol.5 , No. 7 , pp.790
795 .
2. Al Fawzan M. (2000) ," Methods for Estimating the Parameters of the Weibull
Distribution " , King Abdul Aziz City for Science and Technology .
120
121