# Motilal Nehru National Institute of Technology

## Civil Engineering Department

Computer Based Numerical Techniques
CE-401

Least Square Regression
CURVE FITTING
There are two general approaches for curve fitting:
Least Squares regression:
Data exhibit a significant degree of scatter. The strategy is
to derive a single curve that represents the general trend
of the data.
Interpolation:
Data is very precise. The strategy is to pass a curve or a
series of curves through each of the points.

Introduction
In engineering, two types of applications are
encountered:
Trend analysis. Predicting values of dependent
variable, may include extrapolation beyond data
points or interpolation between data points.

Hypothesis testing. Comparing existing
mathematical model with measured data.

Mathematical Background
Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).

Standard deviation. The most common measure of a

n i
n
y
y
i
, , 1 , = =

=
2
) ( ,
1
y y S
n
S
S
i t
t
y
Mathematical Background (contd)
Variance. Representation of spread by the square of
the standard deviation.

or

Coefficient of variation. Has the utility to quantify the

( )
1
/
2
2
2

=

n
n y y
S
i i
y
1
) (
2
2

=

n
y y
S
i
y
% 100 . .
y
S
v c
y
=
Least Squares Regression
Chapter 17
Linear Regression
Fitting a straight line to a set of paired
observations: (x
1
, y
1
), (x
2
, y
2
),,(x
n
, y
n
).
y = a
0
+ a
1
x + e
a
1
- slope
a
0
- intercept
e - error, or residual, between the model and
the observations
Linear Regression: Residual
Linear Regression: Question
How to find a
0
and a
1
so that the error would be
minimum?
Linear Regression: Criteria for a Best Fit

= =
=
n
i
i i
n
i
i
x a a y e
1
1 0
1
) ( min
e
1

e
2

e
1
= -e
2
Linear Regression: Criteria for a Best Fit

= =
=
n
i
i i
n
i
i
x a a y e
1
1 0
1
| | | | min
Linear Regression: Criteria for a Best Fit
| | | | max min
1 0
n
1 i
i i i
x a a y e =
=
Linear Regression: Least Squares Fit

=
= =
n
i
i i
n
i
i r
x a a y e S
1
2
1 0
1
2
) ( min

= = =
= = =
n
i
n
i
i i i i
n
i
i r
x a a y y y e S
1 1
2
1 0
2
1
2
) ( ) model , measured , (
Yields a unique line for a given set of data.
Linear Regression: Least Squares Fit

=
= =
n
i
i i
n
i
i r
x a a y e S
1
2
1 0
1
2
) ( min
The coefficients a
0
and a
1
that minimize S
r
must satisfy
the following conditions:

=
c
c
=
c
c
0
0
1
0
a
S
a
S
r
r

| |

=
=
= =
c
c
= =
c
c
2
1
0
1
0
1
1
1
0
0
0 ) ( 2
0 ) ( 2
i i i i
i i
i i o i
r
i o i
o
r
x a x a x y
x a a y
x x a a y
a
S
x a a y
a
S
Linear Regression:
Determination of a
o
and a
1
( )

+ =
= +
=
2
1
0
1 0
0 0
i i i i
i i
x a x a x y
y a x na
na a
2 equations with 2
unknowns, can be solved
simultaneously
Linear Regression:
Determination of ao and a1

( )

=
2
2
1
i i
i i i i
x x n
y x y x n
a
x a y a
1 0
=
18
19
Error Quantification of Linear Regression

Total sum of the squares around the mean for
the dependent variable, y, is S
t

Sum of the squares of residuals around the
regression line is S
r

=
2
i t
y y S ) (

2
n
1 i
i 1 o i
n
1 i
2
i r
x a a y e S ) (

= =
= =
Error Quantification of Linear Regression
S
t
-S
r
quantifies the improvement or error
reduction due to describing data in terms of a
straight line rather than as an average value.

t
r t
S
S S
r

=
2
r
2:
coefficient of determination
r : correlation coefficient
Error Quantification of Linear Regression
For a perfect fit:
S
r
= 0 and r = r
2
=1, signifying that the line
explains 100 percent of the variability of the
data.
For r = r
2
= 0, S
r
= S
t
, the fit represents no
improvement.
Least Squares Fit of a Straight Line:
Example
Fit a straight line to the x and y values in the
following Table:
5 . 119 =

i i
y x
28 =

i
x
0 . 24 =

i
y
140
2
=

i
x
428571 . 3
7
24
4
7
28
= = = = y x
428571 . 3
7
24
4
7
28
= = = = y x
x
i
y
i
x
i
y
i
x
i
2
1 0.5 0.5 1
2 2.5 5 4
3 2 6 9
4 4 16 16
5 3.5 17.5 25
6 6 36 36
7 5.5 38.5 49
28 24 119.5 140
Least Squares Fit of a Straight Line: Example
(contd)
07142857 . 0 4 8392857 . 0 428571 . 3
8392857 . 0
28 140 7
24 28 5 . 119 7
) (
1 0
2
2
2
1
= =
=
=

=

=

x a y a
x x n
y x y x n
a
i i
i i i i
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line: Example
(Error Analysis)
9911 . 2
2
= =
i r
e S
932 . 0 868 . 0
2
= = = r r
x
i
y
i

1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5
8.5765 0.1687
0.8622 0.5625
2.0408 0.3473
0.3265 0.3265
0.0051 0.5896
6.6122 0.7972
4.2908 0.1993
2
^
2 2
) ( y y e ) y (y
i i i
=
28 24.0 22.7143 2.9911
868 . 0
2
=

=
t
r t
S
S S
r
( ) 7143 . 22
2
= =

y y S
i t
Least Squares Fit of a Straight Line:
Example (Error Analysis)
9457 . 1
1 7
7143 . 22
1
=

=
n
S
s
t
y
7735 . 0
2 7
9911 . 2
2
/
=

=
n
S
s
r
x y
y x y
S S <
/
The standard deviation (quantifies the spread around the mean):
The standard error of estimate (quantifies the spread around the
regression line)
Because , the linear regression model has good fitness
Algorithm for linear regression
Linearization of Nonlinear Relationships
The relationship between the dependent and
independent variables is linear.
However, a few types of nonlinear functions
can be transformed into linear regression
problems.
The exponential equation.
The power equation.
The saturation-growth-rate equation.

Linearization of Nonlinear Relationships
1. The exponential equation.
=
x b
e a y
1
1
x b a y
1 1
ln ln + =
y* = a
o
+ a
1
x
Linearization of Nonlinear Relationships
2. The power equation
=
2
2
b
x a y
x b a y log log log
2 2
+ =
y* = a
o
+ a
1
x*
Linearization of Nonlinear Relationships
3. The saturation-growth-rate equation

+
=
x b
x
a y
3
3
|
.
|

\
|
+ =
x a
b
a y
1 1 1
3
3
3
y* = 1/y
a
o
= 1/a
3
a
1
= b
3
/a
3
x* = 1/x
Example
Fit the following Equation:
2
2
b
x a y =
to the data in the following table:
x
i
y
i

1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
15 19.7
X*=log x
i
Y*=logy
i

0 -0.301
0.301 0.226
0.477 0.534
0.602 0.753
0.699 0.922
2.079 2.141
) log( log
2
2
b
x a y =
2 1 2 0
* *
log
log log let
b , a a a
x, y, X Y
= =
= =
x b a y log log log
2 2
+ =
*
1 0
*
X a a Y + =
Example
Xi Yi X*
i
=Log(X) Y*
i
=Log(Y) X*Y* X*^2
1 0.5 0.0000 -0.3010 0.0000 0.0000
2 1.7 0.3010 0.2304 0.0694 0.0906
3 3.4 0.4771 0.5315 0.2536 0.2276
4 5.7 0.6021 0.7559 0.4551 0.3625
5 8.4 0.6990 0.9243 0.6460 0.4886
Sum 15 19.700 2.079 2.141 1.424 1.169
1 2 2
2
0 1
5 1.424 2.079 2.141
1.75
5 1.169 2.079
( )
0.4282 1.75 0.41584 0.334
i i i i
i i
n x y x y
a
n x x
a y a x

= = =

= = =

Linearization of Nonlinear
Functions: Example
log y=-0.334+1.75log x
1.75
0.46 y x =
Polynomial Regression
Some engineering data is poorly represented
by a straight line.
For these cases a curve is better suited to fit
the data.
The least squares method can readily be
extended to fit the data to higher order
polynomials.

Polynomial Regression (contd)
A parabola is preferable
Polynomial Regression (contd)
A 2
nd
order polynomial (quadratic) is defined by:

The residuals between the model and the data:

The sum of squares of the residual:

e x a x a a y
o
+ + + =
2
2 1
2
2 1 i i o i i
x a x a a y e =
( )

= =
2
2
2 1
2
i i o i i r
x a x a a y e S
Polynomial Regression (contd)
0 x x a x a a y 2
a
S
0 x x a x a a y 2
a
S
0 x a x a a y 2
a
S
2
i
2
i 2 i 1 o i
2
r
i
2
i 2 i 1 o i
1
r
2
i 2 i 1 o i
o
r
= =
c
c
= =
c
c
= =
c
c

) (
) (
) (

+ + =
+ + =
+ + =
4
i 2
3
i 1
2
i o i
2
i
3
i 2
2
i 1 i o i i
2
i 2 i 1 o i
x a x a x a y x
x a x a x a y x
x a x a a n y
3 linear equations
with 3 unknowns
(a
o
,a
1
,a
2
), can be
solved
Polynomial Regression (contd)
A system of 3x3 equations needs to be solved to determine
the coefficients of the polynomial.

The standard error & the coefficient of determination

3
/

=
n
S
s
r
x y
t
r t
S
S S
r

=
2

(
(
(

i i
i i
i
i i i
i i i
i i
y x
y x
y
a
a
a
x x x
x x x
x x n
2
2
1
0
4 3 2
3 2
2
Polynomial Regression (contd)
General:
The mth-order polynomial:

A system of (m+1)x(m+1) linear equations must be solved for
determining the coefficients of the mth-order polynomial.
The standard error:

The coefficient of determination:

e x a x a x a a y
m
m o
+ + + + + = .....
2
2 1
( ) 1
/
+
=
m n
S
s
r
x y
t
r t
S
S S
r

=
2
Polynomial Regression- Example
Fit a second order polynomial to data:

225
3
=
i
x
979
4
=
i
x
x
i
y
i
x
i
2
x
i
3
x
i
4
x
i
y
i
x
i
2
y
i
0 2.1 0 0 0 0 0
1 7.7 1 1 1 7.7 7.7
2 13.6 4 8 16 27.2 54.4
3 27.2 9 27 81 81.6 244.8
4 40.9 16 64 256 163.6 654.4
5 61.1 25 125 625 305.5 1527.5
15 152.6 55 225 979 585.6 2489
6 . 585 =
i i
y x
15 =
i
x
6 . 152 =
i
y
55
2
=
i
x
433 . 25
6
6 . 152
, 5 . 2
6
15
= = = = y x
8 . 2488
2
=
i i
y x
Polynomial Regression- Example (contd)
The system of simultaneous linear equations:

2
2 1 0
86071 . 1 35929 . 2 47857 . 2
86071 . 1 , 35929 . 2 , 47857 . 2
x x y
a a a
+ + =
= = =

(
(
(

8 . 2488
6 . 585
6 . 152
979 225 55
225 55 15
55 15 6
2
1
0
a
a
a
74657 . 3
2
= =
i r
e S
( ) 39 . 2513
2
= =

y y S
i t
Polynomial Regression- Example (contd)
x
i
y
i
y
model
e
i
2
(y
i
-y`)
2
0 2.1 2.4786 0.14332 544.42889
1 7.7 6.6986 1.00286 314.45929
2 13.6 14.64 1.08158 140.01989
3 27.2 26.303 0.80491 3.12229
4 40.9 41.687 0.61951 239.22809
5 61.1 60.793 0.09439 1272.13489
15 152.6 3.74657 2513.39333
The standard error of estimate:

The coefficient of determination:

12 . 1
3 6
74657 . 3
/
=

=
x y
s
99925 . 0 , 99851 . 0
39 . 2513
74657 . 3 39 . 2513
2 2
= = =

= r r r