Вы находитесь на странице: 1из 45

Linear Regression

and
Curve Fitting

1
INTRODUCTION
 Regression: Step back
 It investigates the dependence of one variable , conventionally called the
dependent variable, on one or more other variables called independent
variables .

 It provide an equation to be used for estimating or predicting the average


value of the dependent variable from the known values of the independent
variables.

 The relation between the expected value of the dependent variable and the
independent variable is called regression relation.

 The dependence is rrepresented by a straight line equation, the regression


is said to b linear, otherwise it is said to be curvilinear.
2
CURVE FITTING

Describes techniques to fit curves (curve fitting) to discrete


data to obtain intermediate estimates.

There are two general approaches for curve fitting:


• Least Squares regression:
Data exhibit a significant degree of scatter. The strategy is to
derive a single curve that represents the general trend of
the data.
• Interpolation:
Data is very precise. The strategy is to pass a curve or a
series of curves through each of the points.
INTRODUCTION

In engineering, two types of applications are


encountered:
 Trend analysis. Predicting values of dependent
variable, may include extrapolation beyond data
points or interpolation between data points.

 Hypothesis testing. Comparing existing


mathematical model with measured data.
Introduction to Mathematical Modeling

SCATTER PLOT

5
Introduction to Mathematical Modeling

CURVE FITTING
Breeding Chows and Vizslas

10
9
8
Disposition

7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10

Appearance

6
Introduction to Mathematical Modeling

PROCEDURE

 Given paired data points (xi,yi).


 Produce a scatterplot of the paired data points.

 Fit a linear equation


Y = aX+b
and its graph (curve) to the data.
 Evaluate the fit.

 Analyze the result.

7
Introduction to Mathematical Modeling

TABLE OF DATA
Breeding Chows and Vizslas
Dog Appearance Disposition
Sam 4 2
Jake 6 6
Gus 7 7
10
Max 3 3
9
Suzie 7 10 8
Rover 8 10 7
Zeek 2 5 6
Rex 4 6 5
Tiesha 3 3 4
3
BJ 4 5
2
Missy 7 7 1
Mean 5 5.82 0
0
8
SCATTER PLOT OF DATA

Breeding Chows and Vizslas

10
9
8
Disposition

7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9

Appearance

9
Plotting the Means (x,y)=(5,5.82)

Breeding Chows and Vizslas

10
9
8
Disposition

7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9

Appearance

10
ADDING A TRENDLINE

Breeding Chows and Vizslas

10
9
8
Disposition

7
6
5 e2
4
3
e1 e3
2
1
0
0 1 2 3 4 5 6 7 8 9

Appearance

11
Introduction to Mathematical Modeling

LEAST SQUARES FIT

 Minimize the
error sum of squares
Q =  (ei)2
 The smaller Q, the closer the correlation coefficient R2 is
to 1.
 A perfect fit (all points on the line) has R2=1.

12
TRENDLINE AND EQUATION
Breeding Chows and Vizslas D= 1.0476A + 0.5801
R2 = 0.6619
10
9
8
Disposition

7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10

Appearance

13
NO CORRELATION
 This graph R2 = 0.0003
7
shows 6
essentially no 5
correlation Y
4
between the 3
2
variables X 1
and Y. 0
0 1 2 3 4 5 6 7 8 9 10
X

14
HIGH CORRELATION
 This graph 24

shows a high 20

degree of 16

correlation Y 12

between X 8

and Y. 4

0
0 2 4 6 8 10 12
X Y = 1.6927X + 3.0273
R2 = 0.9996

15
OUTLIERS
24
 Outliers affect
the degree of 20
correlation. 16

Y 12
 Outliers affect
the fitted 8

curve. 4

0
0 2 4 6 8 10 12
Y = 1.5698X+ 3.9045
X
R2 = 0.7968

16
MATHEMATICAL BACKGROUND
 Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).

y
 y i
, i  1, , n
n

 Standard deviation. The most common measure of


a spread for a sample.

St
Sy  , St   ( yi  y ) 2
n 1
MATHEMATICAL BACKGROUND (CONT’D)

 Variance. Representation of spread by the square of


the standard deviation.

 i   y   y 
2 2
( y y ) 2
/n
S 
2
or S 2
 i i
n 1
y
n 1
y

 Coefficient of variation. Has the utility to quantify the


spread of data.
Sy
c.v.  100%
y
Least Squares Regression

Linear Regression
Fitting a straight line to a set of paired observations:
(x1, y1), (x2, y2),…,(xn, yn).
y = a0+ a1 x + e
a1 - slope
a0 - intercept
e - error, or residual, between the model and the
observations
LINEAR REGRESSION: RESIDUAL
LINEAR REGRESSION: QUESTION

How to find a0 and a1 so that the error would be


minimum?
LINEAR REGRESSION: CRITERIA FOR A “BEST”
FIT
n n
min e  (y
i 1
i
i 1
i  a0  a1 xi )

e1 e1= -e2
e2
LINEAR REGRESSION: CRITERIA FOR A “BEST”
FIT
n n
min | e |  | y
i 1
i
i 1
i  a0  a1 xi |
LINEAR REGRESSION: CRITERIA FOR A “BEST”
FIT

n
min max | ei || yi  a0  a1 xi |
i 1
LINEAR REGRESSION: LEAST SQUARES FIT

n n n
S r   e   ( yi , measured  yi , model )   ( yi  a0  a1 xi ) 2
2
i
2

i 1 i 1 i 1

n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1

Yields a unique line for a given set of data.


LINEAR REGRESSION: LEAST SQUARES FIT

n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1

The coefficients a0 and a1 that minimize Sr must satisfy the


following conditions:

 S r
 a  0
 0

 S r  0
 a1
LINEAR REGRESSION:
DETERMINATION OF AO AND A1

S r
 2 ( yi  ao  a1 xi )  0
ao
S r
 2 ( yi  ao  a1 xi ) xi   0
a1
0   yi   a 0   a1 xi
0   yi xi   a 0 xi   a1 xi2

a 0  na0
na0   xi a1   yi 2 equations with 2 unknowns,
can be solved simultaneously
 ii  0i 1i
y x  a x  a x 2
LINEAR REGRESSION:
DETERMINATION OF AO AND A1

n xi yi   xi  yi
a1 
n x   xi 
2 2
i

a0  y  a1 x
30
31
ERROR QUANTIFICATION OF LINEAR
REGRESSION

 Total sum of the squares around the mean


for the dependent variable, y, is St
St   ( yi  y ) 2

 Sum of the squares of residuals around the


regression line is Sr
n n
S r   ei2   ( yi  ao  a1 xi ) 2
i 1 i 1
ERROR QUANTIFICATION OF LINEAR
REGRESSION
 St-Sr quantifies the improvement or error
reduction due to describing data in terms of a
straight line rather than as an average value.

St  S r
r 
2

St
r2: coefficient of determination
r : correlation coefficient
ERROR QUANTIFICATION OF LINEAR
REGRESSION

For a perfect fit:


 Sr= 0 and r = r2 =1, signifying that the line
explains 100 percent of the variability of the
data.
 For r = r2 = 0, Sr = St, the fit represents no
improvement.
LEAST SQUARES FIT OF A STRAIGHT
LINE: EXAMPLE

Fit a straight line to the x and y values in the


following Table:
xi yi xiyi xi2
 xi  28  yi  24.0
1 0.5 0.5 1
2 2.5 5 4  i  140
x 2
 xi yi  119 .5
3 2 6 9
28 24
4 4 16 16 x 4 y  3.42857
5 3.5 17.5 25 7 7
6 6 36 36
28 24
7 5.5 38.5 x  49  4 y  3.428571
7 7
28 24 119.5 140
LEAST SQUARES FIT OF A STRAIGHT LINE:
EXAMPLE (CONT’D)

n xi yi   xi  yi
a1 
n x  ( xi )
2 2
i

7 119.5  28  24
  0.8392857
7 140  28 2

a0  y  a1 x
 3.428571  0.8392857  4  0.07142857
Y = 0.07142857 + 0.8392857 x
LEAST SQUARES FIT OF A STRAIGHT LINE: EXAMPLE
(ERROR ANALYSIS)
^
xi yi (yi  y)
2
e  ( yi  y ) 2
2


 i 
i
   22.7143
2
1 0.5 8.5765 0.1687 S t y y
2 2.5 0.8622 0.5625
S r   ei  2.9911
2
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896 St  S r
6 6.0 6.6122 0.7972 r 
2
 0.868
St
7 5.5 4.2908 0.1993
28 24.0 22.7143 2.9911
r  r 2  0.868  0.932
LEAST SQUARES FIT OF A STRAIGHT LINE:
EXAMPLE (ERROR ANALYSIS)

•The standard deviation (quantifies the spread around the


mean):
St 22.7143
sy    1.9457
n 1 7 1
•The standard error of estimate (quantifies the spread around
the regression line)
Sr 2.9911
sy / x    0.7735
n2 72
Because S y / x  S y , the linear regression model has good fitness
ALGORITHM FOR LINEAR REGRESSION
LINEARIZATION OF NONLINEAR RELATIONSHIPS
•The relationship between the dependent and
independent variables is linear.
•However, a few types of nonlinear functions
can be transformed into linear regression
problems.
 The exponential equation.
 The power equation.
 The saturation-growth-rate equation.
LINEARIZATION OF NONLINEAR RELATIONSHIPS
1. THE EXPONENTIAL EQUATION.

y  a1eb1x 

ln y  ln a1  b1 x
y* = ao + a 1 x
LINEARIZATION OF NONLINEAR RELATIONSHIPS
2. THE POWER EQUATION

y  a2 xb2 

log y  log a2  b2 log x


y* = ao + a1 x*
LINEARIZATION OF NONLINEAR RELATIONSHIPS
3. THE SATURATION-GROWTH-RATE EQUATION

x
y  a3 
b3  x

y* = 1/y
1 1 b3  1  ao = 1/a3
   
y a3 a3  x  a1 = b3/a3
x* = 1/x
EXAMPLE
Fit the following Equation:

y  a2 x b2
to the data in the following table: log y  log( a2 xb2 )
xi yi X*=log xi Y*=logyi log y  log a2  b2 log x
1 0.5 0 -0.301 let Y *  log y, X *  log x,
2 1.7 0.301 0.226 a0  log a2 , a1  b2
3 3.4 0.477 0.534
4 5.7 0.602 0.753 Y *  a0  a1 X *
5 8.4 0.699 0.922
15 19.7 2.079 2.141

Вам также может понравиться