Вы находитесь на странице: 1из 46

ChE 707 Lecture Notes by

A B Tengkiat

ChE 707 Lecture Notes by


A B Tengkiat

Data Fitting

Curve Fitting

3/2/2016

3/2/2016

Curve Fitting

ChE 707 Lecture Notes by


A B Tengkiat

Data Fitting can be done by:


Least Square Regression (a)
Linear Interpolation (b)
Curvilinear Interpolation (c)

Curve Fitting

ChE 707 Lecture Notes by


A B Tengkiat

Curve fitting can involve


Interpolation, where an exact fit to the data is
required; or
Smoothing, in which a "smooth" function is
constructed that approximately fits the data

Process of constructing a curve or


mathematical function that best fit the
data points
Related to regression analysis

3/2/2016

Curve Fitting

ChE 707 Lecture Notes by


A B Tengkiat

Fitted curves can be used


as an aid for data visualization
to infer values of a function if data is unavailable
to summarize the relationships among two or
more variables
for extrapolation but subject to greater degree
of uncertainty as it may reflect the method used
to construct the curve than the observed data

Curve Fitting

ChE 707 Lecture Notes by


A B Tengkiat

Usually
means trying to find the curve that minimizes
vertical displacement of a point from the curve
Can be a smoothing process since number of
fitted coefficients is typically less than number of
data points
relaxed the constraint that interpolant has to go
exactly through the data points but requires it to
approach the data points as closely as possible

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Curve Fitting
Requires parameterizing the potential
interpolants and having some way of measuring
the error, i.e., in the simplest case leads to least
squares approximation

Curve Fitting

ChE 707 Lecture Notes by


A B Tengkiat

Original curve (dotted line)


1st degree (red)
2nd degree (green)

3rd degree (orange)


4th degree (blue)

3/2/2016

Curve Fitting versus


Smoothing
ChE 707 Lecture Notes by
A B Tengkiat

Curve Fitting
often involves the use of an explicit function form
concentrates on achieving a close match with
data values as much as possible

Smoothing
Aims to give a general idea of relatively slow
changes of value
Data is changed to "smoothed" values
often have associated a tuning parameter which
is used to control the extent of smoothing

Curve Fitting Application

ChE 707 Lecture Notes by


A B Tengkiat

Types of application when fitting


experimental data
Trend Analysis
Process of using pattern of data to make prediction or
forecast
For high precision, interpolation can be used
For imprecise data, least squares regression should be
used

3/2/2016

Curve Fitting Application

ChE 707 Lecture Notes by


A B Tengkiat

Hypothesis Testing
Determination of best fit coefficients of mathematical
model
If mathematical model exist then adequacy of the
model is tested
Selection of best model from alternative models

ChE 707 Lecture Notes by


A B Tengkiat

Approximate Fit versus


Exact Fit
Even if an exact match exists, it does not
necessarily follow that we can find it.
Depending on the algorithm used, the
following may be encountered:
Divergent case, where the exact fit cannot be
calculated
Too much computational time required to find
the solution

Either way, you might end up having to


accept an approximate solution

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Approximate Fit versus


Exact Fit
Runges phenomenon, i.e
i.e.,
.,
high order polynomials can
tend to be highly oscillatory
or lumpy
Possible preference of
approximate fit wherein the
effect of averaging out
questionable data points in a
sample, rather than
distorting the curve to fit
them exactly

ChE 707 Lecture Notes by


A B Tengkiat

Approximate Fit versus


Exact Fit
Low order polynomials tend to be smooth.
Maximum number of inflection points
possible in a polynomial curve is n 2,
where n is the order of the polynomial
equation.

3/2/2016

Criterion for Best Fit

ChE 707 Lecture Notes by


A B Tengkiat

Best Fit corresponds to minimization of


errors
Sum of errors or residuals results in minimum
value due to the cancellation of errors
ej = (yj a0 a1 xj)

ChE 707 Lecture Notes by


A B Tengkiat

Criterion for Best Fit


Sum of the absolute errors or residuals can
have multiple results
|ej| = |yj a0 a1 xj|

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Criterion for Best Fit


Minimax criterion, i.e. minimizes the maximum
distance of individual points, fails with the
presence of outlier/s
min((ej) = min
min
min((yj a0 a1 xj)

Criterion for Best Fit

ChE 707 Lecture Notes by


A B Tengkiat

Sum of squares of residuals


ej2 = (yj a0 a1 xj)2
Best option since it overcomes the weakness of the
previous three criteria
Provides a unique solution

ChE 707 Lecture Notes by


A B Tengkiat

3/2/2016

Regression Analysis

ChE 707 Lecture Notes by


A B Tengkiat

Regression Analysis
Refers to any approach to modeling the
relationship between one or more variables
denoted by y with one or more variables
denoted x, such that the model depends on
the unknown parameters to be estimated
from the data
Least Squares method is used in
determination of the parameters of the
model equation
A model is called a linear model when
relationship is linearly dependent

10

3/2/2016

Regression Analysis

ChE 707 Lecture Notes by


A B Tengkiat

Most applications fall into two broad


categories:
Prediction or Forecasting through fitting data
to a predictive model. After developing such a
model, is used to predict data that is not among
the given set of data.
Correlation Testing is evaluation and/or
quantification of the strength of relationship
between dependent variable, y, and independent
variable/s, i.e. x1, x2, . . . , xn. Relationship may
not exist, redundant or weak.

ChE 707 Lecture Notes by


A B Tengkiat

Regression Analysis
First step is to plot and visually inspect the
data to ascertain what form of model
equation will apply, i.e.
Linear model
Nonlinear model

11

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Regression Analysis
Given Model of the form
y = X +
where

y1

y
y = 2
M

y
n

x1 x11

x x 21
X = 2=
M
M

x x
n n1

L
L
O
L

x1 p
x2 p
M
x np

1


= 2
M


n

1


= 2
M


n

Model equations contain


Variable/s
Parameter/s

Regression Analysis
Regressand,, y
Regressand
ChE 707 Lecture Notes by
A B Tengkiat

Also called
Dependent variable
Endogenous variable
Response variable
Measured variable

Variable/s is/are caused by, or directly


influenced by other variables

12

3/2/2016

Regression Analysis
Regressor,, x
Regressor
ChE 707 Lecture Notes by
A B Tengkiat

Also called
Exogenous variable
Explanatory variable
Covariate
Input variable
Predictor variable
Independent variable

Regression Analysis
Regressor,, x
Regressor
ChE 707 Lecture Notes by
A B Tengkiat

Can be
Constant (called intercept)
Linear term
Nonlinear term

13

3/2/2016

Regression Analysis
Parameter vector,
ChE 707 Lecture Notes by
A B Tengkiat

Is called
Effect
Regression coefficients
Predictor variable
Independent variable

Statistical inferences on regression focuses on

Regression Analysis

ChE 707 Lecture Notes by


A B Tengkiat

Error term,
Is also called Disturbance Term or Noise
Variable captures all other factors which
influence the dependent variable y other than
the regressors x
Relationship between the error term and the
regressors,, for example whether they are
regressors
correlated, is a crucial step in formulating a
linear regression model, as it will determine the
method to use for estimation

14

3/2/2016

Regression Statistics
Sum of Squares of Errors (SSE)
ChE 707 Lecture Notes by
A B Tengkiat

Also called
Sum of SquaredResiduals (SSR)
Error Sum of Squares (ESS)
Residual Sum of Squares (RSS)

Square of the difference between actual and


predicted values

SSE = ( y y ) 2 = [ y f ( x)]2

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

Total Sum of Squares (TSS)


Variance of dependent variable around its mean
Tells how much of the initial variation in the
sample were explained by the regression

y
TSS = ( y y ) = y
n

15

3/2/2016

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

Standard Error of Estimate (S


(S y /x )
Also called Standard Error of Regression
Quantifies spread of data around regression line
Sy/x

SSE
=
=
n (m + 1)

( y y )

n (m + 1)

where n = number of data points


m = number of parameters to be estimated

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

Coefficient of Determination (R
(R2)
Indicate goodness
goodnessof
of
fit of regression
Ratio of explained variance to the total
variance, i.e. how many percent of data is
accounted for by the correlation
R2 = 1

SSE
( y y ) 2 = 1 n 2 [ y f ( x)]2
= 1
TSS
( y y)2
(ny y) 2

Value is between 0 and 1


Biased parameter since it will never decrease if
additional regressors are added even if they are
irrelevant

16

3/2/2016

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

Correlation Coefficient (R
(R)
Measures linearity
Value is between 0 and 1
R = 1

SSE
( y y ) 2
= 1
TSS
( y y)2

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

Adjusted R2 (R2adj)
Slightly modified version of R2 designed to
penalize for excess number of regressors which
do not add to the explanatory power of the
regression
2
Radj
= 1

(n 1) ( y y ) 2
n 1
(n 1) SSE
(1 R 2 ) = 1
= 1
n p
(n p )TSS
( n p ) ( y y ) 2

where n = number of data points


p = number of independent variables

17

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Regression Statistics
Always smaller in magnitude with respect to R2
by accounting the number of parameters being
predicted
Can be even negative for poorly fitting models

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

t Statistics
Test existence of coefficient/s
Test of equality of means, i.e.,
i.e., actual and
predicted data
Large values indicate that the hypothesis of
existence of coefficient/s can be rejected and
that the corresponding coefficient is not zero
Physical sense of coefficient should be checked
before accepting or rejecting it

18

3/2/2016

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

p Value
Expresses the results of the hypothesis test as a
significance level using the t Distribution
Values smaller than 0.05 are taken as evidence
that the coefficient is nonzero under 95%
confidence level

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

F statistics
Test of goodness or lack of fit
Test of variance, i.e. actual data and predictor
Large values indicate a good fit between actual
and predicted data
F=

explained variance
unexplaine d variance

19

3/2/2016

Regression Statistics

ChE 707 Lecture Notes by


A B Tengkiat

ChE 707 Lecture Notes by


A B Tengkiat

Significance F
Is the equivalent parameter of F Statistics for
the pvalue of t Statistics
Expresses the results of the hypothesis test as a
significance level using the F Distribution
Values smaller than 0.05 are taken as evidence
that the correlation is good

Linear Regression

20

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Linear Regression
First type of regression analysis to be
studied rigorously and used extensively in
practical applications

ChE 707 Lecture Notes by


A B Tengkiat

Linear Regression
Easier to fit linear model/s compared to
nonlinear model/s, which also provides
easier determination of statistical properties
of the resulting estimators
Usually used to fit linear equations or model
through least squares approach
Even though the terms "least squares" and
linear model are closely linked, they are not
synonymous as least squares approach can
be used to fit nonlinear
nonlinear models.

21

3/2/2016

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

Fit the data to quadratic equation, i.e.,


i.e., y = a0
+ a1x + a2x2, by transformation
x
0
1
2
3
4
5

y
2.1
7.7
13.6
27.2
40.9
61.1

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

Using Quadratic Equation:


y = a0 + a1x + a2x 2
Regression Statistics
Multiple R
0.9993
R Square
0.9985
Adjusted R Square
0.9975
Standard Error
1.1175
Observations
6

Fit is very good since values are close to 1


Drop is negligible indicating validity of variables

ANOVA
df

Regression
Residual
Total

SS

MS

2 2509.647 1254.823 1004.777


3 3.746571 1.248857
5 2513.393

Significance F

5.76E-05 Sig F << 0.05,


indicating good
prediction

22

3/2/2016

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue > 0.05 indicate that existence of


variable is questionable, possible removal of
constant or variable
Coefficients

Intercept
x
2
x

2.4786
2.3593
1.8607

Standard
Error

t Stat

1.0128
0.9527
0.1829

2.4471
2.4764
10.1735

P-value

0.0919
0.0896
0.0020

RESIDUAL OUTPUT
Observation

Predicted y

1
2
3
4
5
6

2.4786
6.6986
14.6400
26.3029
41.6871
60.7929

Lower 95%

Upper 95%

Pvalue
Test

5.7019 Failed
5.3912 Failed
2.4428 Passed

-0.7447
-0.6727
1.2787

PROBABILITY OUTPUT
Residuals

Standard
Residuals

-0.3786
1.0014
-1.0400
0.8971
-0.7871
0.3071

Percentile

-0.4373
1.1569 Residual is
-1.2014 insignificant
1.0364 for some
data
-0.9093
0.3548

8.33
25.00
41.67
58.33
75.00
91.67

2.1000
7.7000
13.6000
27.2000
40.9000
61.1000

Fit is good but statistically not sound

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

Removing Constant from Quadratic Equation


y = a1x + a2x 2
Regression Statistics
Multiple R
0.9991
R Square
0.9982
Adjusted R Square
0.7478
Standard Error
1.6752
Observations
6

Fit is very good since values are close to 1


Drop is due force fitting quadratic equation
without constant

ANOVA
df

Regression
Residual
Total

SS

MS

2 6383.295 3191.647 1137.296


4 11.22539 2.806348
6 6394.52

Significance F

4.78E-05

Sig F << 0.05,


indicating good
prediction

23

3/2/2016

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue < 0.05 indicate the existence of


variables are valid statistically
Coefficients

x
2
x

Standard
Error

4.1374
1.5913

0.9237
0.2189

t Stat

4.4791
7.2682

RESIDUAL OUTPUT
Observation

0.0110
0.0019

Lower 95%

1.5728
0.9834

Upper 95%

6.7020
2.1992

PROBABILITY OUTPUT

Predicted y

1
2
3
4
5
6

P-value

Residuals

0.0000
5.7287
14.6400
26.7339
42.0104
60.4696

2.1000
1.9713
-1.0400
0.4661
-1.1104
0.6304

Standard
Residuals

1.5353
1.4412
-0.7603
0.3408
-0.8118
0.4609

Percentile

8.33
25.00
41.67
58.33
75.00
91.67

2.1000
7.7000
13.6000
27.2000
40.9000
61.1000

Residual is significant
for some data

Fit is statistically sound

ChE 707 Lecture Notes by


A B Tengkiat

Example 1
Which to use?
Equation 1
Equation 2
R2
Radj2
Standard Error
Significance F

y = a0 + a1x + a2x 2
y = a1x + a2x 2
Eq 1
Eq 2
0.9985
0.9982
0.9975
0.7478
1.1175
1.6752

5
5.8 x 10
4.8 x 105

Eq 1 provides best fit


fit but statistically unsound

24

3/2/2016

Example 2

ppm Pb+2 Q (mg Pb+2/g)


0
0
0.398
8.86 40
2.805
16.78 30
28.620
24.36 20
10
67.451
27.51 0
0
104.974
28.23
139.615
32.57
Q (m g P b /g m o s s )

ChE 707 Lecture Notes by


A B Tengkiat

Given the set of data, use linear and quadratic


equations to fit a data set

50

100

150

Ceq (ppm)

Example 2

ChE 707 Lecture Notes by


A B Tengkiat

Solution
Using Linear Equation: Q = a + b ppm Pb+2
Regression Statistics
Multiple R
0.8705
R Square
0.7577
Adjusted R Square
0.6971
Standard Error
4.6794
Observations
6
ANOVA
df

Regression
Residual
Total

SS

Fit not good since values are far from 1


Drop versus R2 not high, variable used is suitable

MS

Significance F

1 273.8812 273.8812 12.50776 0.024086 Passed with Sig


F < 0.05, hence
4 87.58759 21.8969
good variance
5 361.4688
prediction

25

3/2/2016

Example 2

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue < 0.05


indicate existence of
variable and constant
Coefficients

Intercept
+2
ppm Pb

15.4286
0.1301

Standard
Error

t Stat

2.8450
0.0368

5.4231
3.5366

P-value

0.0056
0.0241

RESIDUAL OUTPUT
Observation

Predicted Q

1
2
3
4
5
6

15.4803
15.7936
19.1519
24.2038
29.0855
33.5921

Lower 95%

Upper 95%

7.5296
0.0280

23.3276
0.2322

PROBABILITY OUTPUT
Residuals

-6.6162
0.9815
5.2058
3.3057
-0.8535
-2.0234

Standard
Residuals

Percentile

-1.5808
0.2345
1.2438
0.7898
-0.2039
-0.4834

8.33
25.00
41.67
58.33
75.00
91.67

8.8641
16.7751
24.3577
27.5095
28.2320
31.5688

Fit is not good

Residual is
significant

Example 2

ChE 707 Lecture Notes by


A B Tengkiat

Using Quadratic Equation:


Q = a + b ppm Pb+2 + c (ppm Pb+2)2
Regression Statistics
Multiple R
0.9314
R Square
0.8676
Adjusted R Square
0.7793
Standard Error
3.9946
Observations
6

Fit not good since R2 is far from 1


Drop versus R2 is high, indicating that
some variable/s is/are unnecessary

ANOVA
df

Regression
Residual
Total

SS

MS

Significance F

2 313.5979 156.7989 9.826348 0.048195 Sig F is near the


border of 0.05,
3 47.87097 15.95699
hence potential
5 361.4688
poor prediction

26

3/2/2016

Example 2

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue > 0.05 indicate that existence of


variable is questionable. Hence, there is no
validity in increasing order of the polynomial.
Coefficients

Intercept
ppm
2
ppm

13.2241
0.3089
-0.0013

Standard
Error

2.8019
0.1176
0.0009

t Stat

4.7196
2.6265
-1.5777

P-value

Lower 95%

0.0180
0.0786
0.2127

RESIDUAL OUTPUT
Observation

Predicted Q

1
2
3
4
5
6

13.3467
14.0801
20.9637
27.9426
30.8335
30.1407

4.3070
-0.0654
-0.0041

Upper 95%

Pvalue
Test

22.1411 Passed
0.6832 Failed
0.0014 Failed

PROBABILITY OUTPUT
Residuals

-4.4826
2.6950
3.3941
-0.4331
-2.6015
1.4281

Standard
Residuals

Percentile

-1.4487
0.8710
1.0969
-0.1400
-0.8408
0.4615

8.33
25.00
41.67
58.33
75.00
91.67

8.8641
16.7751
24.3577
27.5095
28.2320
31.5688

Fit is not good

Residual is
significant

Example 3

1100
1050
Reactor Temperature (K)

ChE 707 Lecture Notes by


A B Tengkiat

Correlate temperature as a function of


reaction length using a polynomial equation

1000
950
900
850
800
750
700
10

15

20

25

30

35

Reactor Length (cm)

27

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Solution
Using Quadratic Equation: T = a + b L + c L2
Regression Statistics
Multiple R
0.9924
R Square
0.9848
Adjusted R Square
0.9835
Standard Error
13.4849
Observations
26

Fit is good since values are close to 1


Drop versus R2 is minimal, variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

2
23
25

MS

271559
4182
275741

Significance
F

135780
182

747

1.2E-21

Passed
Sig F << 0.05, hence good
variance prediction

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue < 0.05


indicate existence of
variable
Coefficients

Intercept
L
2
L

-4.5489
92.2626
-1.9796

Standard
Error

25.1386
2.3968
0.0527

t Stat

P-value

-0.1810
0.8580
38.4947 2.18E-22
-37.5762 3.76E-22

Lower 95%

-56.5521
87.3045
-2.0886

Upper 95%

Pvalue
Test

47.4543 Failed
97.2207 Passed
-1.8706 Passed

Remove Intercept to improve Fit

28

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Quadratic Equation: T = b L + c L2


Regression Statistics
Multiple R
0.9999
R Square
0.9998
Adjusted R Square
0.9582
Standard Error
13.2103
Observations
26

Fit is good since values are almost 1


Drop versus R2 is higher compared versus previous fit
but still acceptable, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

MS

2 24125754 12062877
24
4188
175
26 24129942

Significance
F

69123

3.48E-44

Passed
Sig F is almost nil, hence
good variance prediction

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Pvalue is almost nil


indicating existence of
variable
Coefficients

L
L

Standard
Error

t Stat

P-value

Lower 95%

Upper 95%

91.8379

0.4763 192.8312

8.36E-40

90.8549

92.8209

-1.9706

0.0172 -114.7757

2.11E-34

-2.0060

-1.9352

Have a good fit, try higher order to check


if a better correlation can be obtained

29

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Cubic Equation:


T = a + b L + c L2 + d L3
Regression Statistics
Multiple R
0.9959
R Square
0.9918
Adjusted R Square
0.9906
Standard Error
10.1622
Observations
26

Fit is good since values are almost 1


Drop versus R2 is minimal, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

3
22
25

MS

273469
2272
275741

Significance
F

91156
103

883

4.58E-23

Passed
Sig F is almost nil, hence
good variance prediction

Example 3
Pvalue is almost nil indicating
existence of variables and constant

ChE 707 Lecture Notes by


A B Tengkiat

Coefficients

Intercept -242.2922
L
129.2318
L

Standard
Error

t Stat

P-value

Lower 95%

Upper 95%

58.4318
8.7831

-4.1466
14.7137

0.0004 -363.4724 -121.1120


7.23E-13 111.0168 147.4469

-3.7398

0.4112

-9.0955

6.58E-09

-4.5925

-2.8871

0.0261

0.0061

4.3011

0.0003

0.0135

0.0387

Have a better fit than quadratic equation


and has a lower standard error
Try higher order to check if a better
correlation can be obtained

30

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Quartic Equation:


T = a + b L + c L2 + d L3 + e L4
Regression Statistics
Multiple R
0.9994
R Square
0.9988
Adjusted R Square
0.9986
Standard Error
3.9137
Observations
26

Fit is good since values are almost 1


Drop versus R2 is minimal, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

4
21
25

MS

275420
322
275741

Significance
F

68855
15

4495

1.83E-30

Passed
Sig F is almost nil, hence
good variance prediction

Example 3
Pvalue is almost nil indicating existence of
variables and constant except for L variable

ChE 707 Lecture Notes by


A B Tengkiat

Coefficients

Standard
Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

514.0935

70.7089

7.2706

L
2
L

-29.8530

14.4985

-2.0590

5.21E-02

-60.0043

0.2983

8.0375

1.0557

7.6137

1.80E-07

5.8421

10.2329

-0.3402
0.0041

0.0325
0.0004

-10.4536
11.2839

8.86E-10
2.25E-10

-0.4079
0.0033

-0.2726
0.0048

L
4
L

3.68E-07 367.0463 661.1406

Have a better fit than cubic equation


and has a lower standard error
Refit removing L variable

31

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Quartic Equation:


T = a + c L2 + d L3 + e L4
Regression Statistics
Multiple R
0.9993
R Square
0.9986
Adjusted R Square
0.9984
Standard Error
4.1920
Observations
26

Fit is good since values are almost 1


Drop versus R2 is minimal, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

3
22
25

MS

275355
387
275741

Significance
F

91785
18

5223

1.59E-31

Passed
Sig F is almost nil, hence
good variance prediction

Example 3
Pvalue is almost nil indicating
existence of variables

ChE 707 Lecture Notes by


A B Tengkiat

Coefficients

Intercept
2
L
3

L
4
L

Standard
Error

t Stat

P-value

Lower 95%

Upper 95%

369.1665

7.2326

51.0417

2.40E-24 354.1669 384.1661

5.8722

0.0988

59.4468

8.57E-26

5.6673

6.0770

-0.2741
0.0033

0.0058
0.0001

-47.6313
37.1423

1.08E-23
2.41E-21

-0.2861
0.0032

-0.2622
0.0035

Have negligible variation in R2 without L


variable and slightly higher standard
error but fit is statistically sound
Try higher order fit

32

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Quintic Equation:


T = a + b L + c L2 + d L3 + e L4 + f L5
Regression Statistics
Multiple R
0.9995
R Square
0.9990
Adjusted R Square
0.9987
Standard Error
3.8016
Observations
26

Fit is good since values are almost 1


Drop versus R2 is minimal, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

5
20
25

MS

275452
289
275741

55090
14

Significance F

3812

4.54E-29

Passed
Sig F is almost nil, hence
good variance prediction

Example 3
Pvalue is almost nil indicating existence of variables
and constant except for L and L5 variables

ChE 707 Lecture Notes by


A B Tengkiat

Coefficients

Standard
Error

Intercept 826.0748 218.7051


L
-112.4497
56.7479
2
L
16.3610
5.6338
3
L
-0.7407
0.2684
4
L
0.0133
0.0062
5
L
-8.21E-05
0.0001

t Stat

3.7771
-1.9816
2.9041
-2.7597
2.1610
-1.5025

P-value

Lower 95%

Upper 95%

0.0012 369.8639 1282.2856


0.0614 -230.8236
5.9243
0.0088
4.6090
28.1129
0.0121
-1.3005
-0.1808
0.0430
0.0005
0.0262
0.1486 -1.96E-04 3.19E-05

Better than quartic equation but L variable


existence is statistically doubtful
Try without L variable

33

3/2/2016

Example 3

ChE 707 Lecture Notes by


A B Tengkiat

Using Quintic Equation:


T = a + c L2 + d L3 + e L4 + f L4
Regression Statistics
Multiple R
0.9994
R Square
0.9987
Adjusted R Square
0.9985
Standard Error
4.0578
Observations
26

Fit is good since values are almost 1


Drop versus R2 is minimal, hence variable used is suitable

ANOVA
df

Regression
Residual
Total

SS

4
21
25

MS

275396
346
275741

68849
16

Significance F

4181

3.91E-30

Passed
Sig F is almost nil, hence
good variance prediction

Example 3
Pvalue is almost nil indicating existence of
variables except for L4 and L5 variables

ChE 707 Lecture Notes by


A B Tengkiat

Coefficients

Intercept
2
L
3
L
4
L
5
L

393.8714
5.2247
-0.2137
0.0013
2.28E-05

Standard
Error

17.1820
0.4222
0.0388
0.0013
0.0000

t Stat

22.9235
12.3756
-5.5143
1.0460
1.5745

P-value

Lower 95%

2.41E-16 358.1395
4.12E-11
4.3467
1.80E-05
-0.2944
0.3074
-0.0013
0.1303 -7.31E-06

Upper 95%

429.6032
6.1027
-0.1331
0.0040
5.29E-05

Quintic equation is not valid due to p


pvalue
exceeding 0.05. Hence, quartic equation,
i.e. T = a + c L2 + d L3 + e L4, is the best
because it satisfies all the conditions

34

ChE 707 Lecture Notes by


A B Tengkiat

3/2/2016

Nonlinear Regression

ChE 707 Lecture Notes by


A B Tengkiat

Nonlinear Regression
Parameters appear as functions like 2, e x,
etc.
f/j usually is a combination of the
parameter and the independent variable
Require initial value/s or estimate/s
Method of False Position can be used to
solve for a nonlinear parameter
Solution
Is an iterative process
May be many due to multiple minima in the sum
of squares

35

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Nonlinear Regression
Nonconvergence or failure to find a
minimum value of SSE is common
Nonlinear Regression are avoided by
simplification of curve fitting through
Linearization
Segmentation of the curve into several sections
wherein each segment is fitted by linear
equation or simplified linearized form

Example

ppm Pb+2 Q (mg Pb+2/g)


0
0
40
0.40
8.86 30
2.81
16.78 20
28.62
24.36 10
67.45
27.51 0
0
104.97
28.23
139.62
32.57
Q (m g P b /g m o s s )

ChE 707 Lecture Notes by


A B Tengkiat

Fit the data below to an equation of the form


y = a + bxn

50

100

150

Ceq (ppm)

36

3/2/2016

Example

ChE 707 Lecture Notes by


A B Tengkiat

Solution
Model Equation:
Residual:
SSE
SSE::

y = a + bxn
ej = yj a bxjn
SSE = ej2 = (yj a bxjn)2

Getting Partial Derivatives


SSE
SSE//a = 0 = 2 (yj a bxjn) = 0
SSE
SSE//b = 0 = 2 (yj a bxjn)xjn = 0
SSE
SSE//n = 0 = 2b(yj a bxjn) xjn ln xj = 0

ChE 707 Lecture Notes by


A B Tengkiat

Example
Simplifying the equations
na + bxjn = yj
axjn + bxj2n = xjnyj
axjn ln xj + bxj2n ln xj = xjnyj ln xj
Rearranging the 1st equation
a = n 1(yj bxjn)

37

3/2/2016

Example
Combining the equations results to
n x j y j x j
ChE 707 Lecture Notes by
A B Tengkiat

n x j
2n

( x )

n x j y j ln x j x j ln x j y j
n

=b=

n 2

n x j ln x j x j ln x j x j
2n

With the 6 data points and y = 134.71, equation


simplifies to

x y
6 x
n

22.45 x j

2n
j

( x )

n 2

x y ln x 22.45 x ln x
6 x ln x x ln x x
n

j
2n

j
n

Example

ChE 707 Lecture Notes by


A B Tengkiat

To solve for n,
Assume n
Compute for the left
lefthand side (LHS) and right
righthand
side (RHS) of the equation
LHS

x y
6 x
n

22.45 x j

2n
j

( x )

n 2

RHS

x y ln x 22.45 x ln x
6 x ln x x ln x x
n

j
2n

j
n

When LHS = RHS stop iteration else change value of n


until LHS = RHS

38

3/2/2016

Example

ChE 707 Lecture Notes by


A B Tengkiat

Assume n = 0.25

xj
0.40
2.81
28.62
67.45
104.97
139.61
343.86

yj
8.86
16.78
24.36
24.91
28.23
31.57
134.71

xjn
0.79
1.29
2.31
2.87
3.20
3.44
13.91

y j , actual y j , predicted
8.86
10.92
16.78
14.71
24.36
22.42
24.91
26.60
28.23
29.14
31.57
30.93
134.71
110.16

x j 2n
x j n ln x j x j 2n ln x j
0.63
(0.73)
(0.58)
1.67
1.34
1.73
5.35
7.76
17.94
8.21
12.07
34.59
10.25
14.90
47.68
11.82
16.98
58.36
37.93
52.30
159.72
ej
(2.06)
2.07
1.94
(1.69)
(0.91)
0.64
24.55
SSE =

ej2
4.23
4.28
3.77
2.85
0.82
0.41
602.83
614.96

xjnyj
x j n y j ln x j
7.04
(6.49)
21.71
22.40
56.34
188.96
71.40
300.70
90.37
420.54
108.52
535.94
355.37
1,462.06

LHS =
RHS =
b =
a =

7.57
7.47
7.57
4.91

Example

ChE 707 Lecture Notes by


A B Tengkiat

Assume n = 0.25

xj
0.40
2.81
28.62
67.45
104.97
139.61
343.86

yj
8.86
16.78
24.36
24.91
28.23
31.57
134.71

y j , actual y j , predicted
8.86
7.91
16.78
18.17
24.36
25.34
24.91
27.10
28.23
27.87
31.57
28.33

xjn
1.26
0.77
0.43
0.35
0.31
0.29
3.42

x j 2n
x j n ln x j x j 2n ln x j
1.59
(1.16)
(1.46)
0.60
0.80
0.62
0.19
1.45
0.63
0.12
1.47
0.51
0.10
1.45
0.45
0.08
1.44
0.42
2.67
5.45
1.17
ej
0.96
(1.39)
(0.98)
(2.19)
0.36
3.24
SSE =

ej2
0.92
1.93
0.97
4.78
0.13
10.52
18.34

x j n y j x j n y j ln x j
11.16
(10.29)
12.96
13.37
10.53
35.32
8.69
36.61
8.82
41.05
9.18
45.36
61.35
161.42

LHS =
RHS =
b =
a =

(21.09)
(20.22)
(21.09)
34.46

39

3/2/2016

Example

ChE 707 Lecture Notes by


A B Tengkiat

By Newton
NewtonRaphson Solution, n = 0.01621
xj
0.40
2.81
28.62
67.45
104.97
139.61
343.86

yj
8.86
16.78
24.36
24.91
28.23
31.57
134.71

xjn
0.99
1.02
1.06
1.07
1.08
1.08
6.29

y j , actual y j , predicted
8.86
9.26
16.78
15.88
24.36
24.02
24.91
27.10
28.23
28.71
31.57
29.75

x j 2n
x j n ln x j x j 2n ln x j
0.97
(0.91)
(0.89)
1.03
1.05
1.07
1.11
3.54
3.74
1.15
4.51
4.83
1.16
5.02
5.41
1.17
5.35
5.80
6.60
18.56
19.95
ej
(0.40)
0.90
0.34
(2.19)
(0.48)
1.82
SSE =

ej2
0.16
0.81
0.12
4.78
0.23
3.31
9.23

xjnyj
x j n y j ln x j
8.73
(8.05)
17.06
17.60
25.72
86.26
26.68
112.34
30.44
141.68
34.20
168.91
142.83
518.75

LHS =
RHS =
b =
a =

208.61
208.61
208.61
(196.25)

ChE 707 Lecture Notes by


A B Tengkiat

Example
Regression Statistics
Multiple R
0.9862
R Square
0.9726
Adjusted R Square
0.9658
Standard Error
1.5323
Observations
6
ANOVA
df

Regression
Residual
Total

1
4
5
Coefficients

Intercept
xn

-196.25
208.61

SS

MS

333.69
9.39
343.08

333.69
2.35

Standard
Error

18.36
17.50

t Stat

-10.69
11.92

142.12

P-value

0.0004
0.0003

Significance F

0.0003

Lower 95%

-247.22
160.03

Upper 95%

-145.29
257.19

40

3/2/2016

Example
RESIDUAL OUTPUT

ChE 707 Lecture Notes by


A B Tengkiat

Observation

1
2
3
4
5
6

Predicted y

9.26
15.88
24.02
27.10
28.71
29.75

PROBABILITY OUTPUT
Residuals

-0.40
0.90
0.34
-2.19
-0.48
1.82

Standard
Residuals

Percentile

-0.29
0.66
0.25
-1.59
-0.35
1.33

8.33
25.00
41.67
58.33
75.00
91.67

8.86
16.78
24.36
24.91
28.23
31.57

ChE 707 Lecture Notes by


A B Tengkiat

Methodology of False
Position Method
Determine nonlinear parameter in the model
equation
Rearrange model equation to isolate
nonlinear term
Assume
Initial guess for nonlinear parameter, c
Increment of nonlinear parameter, i.e.,
i.e., c

Compute the linear parameter/s and SSE of


the model equation using
c1 = c + c
c2 = c
c3 = c c

41

3/2/2016

Methodology of False
Position Method
ChE 707 Lecture Notes by
A B Tengkiat

New c value is calculated by Method of


False Position
c5 = c2 +

SSE1 SSE3
c
2 SSE1 2 SSE2 + SSE3

Repeat process until a minimum SSE is


calculated or a desired accuracy is reached

Example 1

ppm Pb+2 Q (mg Pb+2/g)


0
0
40
0.40
8.86 30
2.81
16.78 20
28.62
24.36 10
67.45
27.51 0
0
104.97
28.23
139.62
32.57
Q (m g P b /g m o s s )

ChE 707 Lecture Notes by


A B Tengkiat

Fit the data below to an equation of the form


y = a + bxn

50

100

150

Ceq (ppm)

42

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

Example 1
Solution
Model Equation:
y = a + bxn
Nonlinear parameter is a
Linearize equation
ln (y a) = ln b + n ln x
Initial guesses
a = 214 a = 100

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

(314.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

b=
n=
ln(y - a)
5.78
5.80
5.82
5.83
5.84
5.85

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

(214.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

ypredicted
9.22
15.92
24.05
27.10
28.69
29.72
=
b=
n=

ln(y - a)
5.41
5.44
5.47
5.48
5.49
5.50

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

ypredicted
9.25
15.88
24.02
27.10
28.70
29.74
=

326.37
0.01
SSE
0.13
0.74
0.09
4.79
0.21
3.43
9.396

Residual % Abs Error


(0.36)
4.07
0.86
5.12
0.31
1.26
(2.19)
8.78
(0.46)
1.62
1.85
5.87
26.73

226.36
0.01
SSE
0.15
0.80
0.11
4.76
0.22
3.35
9.392

Residual % Abs Error


(0.39)
4.39
0.89
5.32
0.34
1.38
(2.18)
8.76
(0.47)
1.66
1.83
5.80
27.30

43

3/2/2016

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

(114.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

b=
n=
ln(y - a)
4.81
4.87
4.93
4.93
4.96
4.98

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

ypredicted
9.32
15.80
23.95
27.08
28.73
29.80
=

126.34
0.03
SSE
0.21
0.95
0.17
4.70
0.25
3.13
9.406

Residual % Abs Error


(0.46)
5.14
0.98
5.81
0.41
1.68
(2.17)
8.70
(0.50)
1.76
1.77
5.60
28.70

New value for a, i.e. a5 = 202.05


New values
a = 200
a = 4

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

(204.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

b=
n=
ln(y - a)
5.36
5.40
5.43
5.43
5.45
5.46

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

(200.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

ypredicted
9.26
15.88
24.02
27.10
28.70
29.74
=
b=
n=

ln(y - a)
5.34
5.38
5.41
5.42
5.43
5.44

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

ypredicted
9.26
15.88
24.01
27.10
28.70
29.74
=

216.36
0.02
SSE
0.15
0.81
0.12
4.76
0.22
3.34
9.39181

Residual % Abs Error


(0.39)
4.43
0.90
5.35
0.34
1.40
(2.18)
8.76
(0.47)
1.66
1.83
5.78
27.39

212.36
0.02
SSE
0.16
0.81
0.12
4.76
0.22
3.33
9.39178

Residual % Abs Error


(0.39)
4.45
0.90
5.36
0.34
1.41
(2.18)
8.76
(0.47)
1.67
1.82
5.78
27.42

44

3/2/2016

Example 1

ChE 707 Lecture Notes by


A B Tengkiat

a=
y
8.86
16.78
24.36
24.91
28.23
31.57

(196.00)
x
0.40
2.81
28.62
67.45
104.97
139.61

b=
n=
ln(y - a)
5.32
5.36
5.40
5.40
5.41
5.43

ln x
(0.92)
1.03
3.35
4.21
4.65
4.94

ypredicted
9.26
15.87
24.01
27.10
28.70
29.75
=

208.36
0.02
SSE
0.16
0.81
0.12
4.76
0.22
3.32
9.39177

Residual % Abs Error


(0.40)
4.47
0.90
5.38
0.34
1.42
(2.18)
8.75
(0.47)
1.67
1.82
5.78
27.46

New value for a, i.e. a5 = 197.22


Final values after a few iterations
a = 196.52
b = 208.88
n = 0.0162

Curve Segmentation

ChE 707 Lecture Notes by


A B Tengkiat

Michaelis
Michaelis
Menten Model for enzyme kinetics
v=

Vmax S
Km + S

Curve Segmentation
Linear: v = VmaxKmS
Nonlinear: v = aSb
Constant: v = Vmax

45

3/2/2016

ChE 707 Lecture Notes by


A B Tengkiat

References
1.

Akai. Applied Numerical Methods for Engineers. New


York: John Wiley and Sons, Inc., 1994.

2.

Chapra and Canale.


Canale. Numerical Methods for Engineers
with Software and Programming Applications. New

York: The McGraw


McGrawHill Companies, Inc., 2002.
3.

Perry, R. H., D. W. Green and J. O. Maloney. Perrys


Chemical Engineers Handbook. 6th ed. New York:
McGraw--Hill, Inc., 1984.
McGraw

4.

Press, Teukolsky,
Teukolsky, Vetterling and Flannery. Numerical

Recipes in Fortran 77: The Art of Scientific


Computing 2nd ed. Melbourne: Cambridge University

Press, 1992.

ChE 707 Lecture Notes by


A B Tengkiat

References
5.

http://www.wikipedia.org/

6.

Mathematics Source Library C & ASM.


http://mymathlib. webtrellis.net/index.html

46

Вам также может понравиться