Вы находитесь на странице: 1из 9

Talluri Prasanth

2015JULB02046
PGDM-FINANCE

Assignemnt-2

Regression Model: Predicting airfares on new routes

Scenario:
Several airports have opened in major cities in USA, opening the market for new
routes. In order to price flights on these routes a major airline collected information
on 638 air routes in the United States.

Summary
We are exploring this data and based on the information available we are
predicting the average fare on the new route for next quarter based on
information of previous quarter.
The variables considered for the analysis are
S_CODE
HI
E_CODE
S_INCOME
S_CITY
E_INCOME
E_CITY
S_POP
COUPON
E_POP
NEW
HI
VACATION
SLOT
FARE
GATE
DISTANCE
PAX

#1. . Explore the numerical predictors and response (FARE) by creating a

correlation table and examining some scatter plots between FARE and those
predictors. What seems to be the best single predictor of FARE?

Correlations
COUPON
COUPON

Pearson Correlation

DISTANCE
.747**

.497**

.000

.000

638

638

638

.747**

.670**

Sig. (2-tailed)
N
DISTANCE

FARE

Pearson Correlation

FARE

Sig. (2-tailed)

.000

638

638

638

.497**

.670**

Sig. (2-tailed)

.000

.000

638

638

Pearson Correlation

.000

638

**. Correlation is significant at the 0.01 level (2-tailed).

Inferences:
1. After exploring the numerical variables I fell there is a significant amount of
correlation between Fare and distance, coupon and Fare.
2. On drawing a scatter plot, we can predict that with a change in distance the
fare is changing in positive direction.
3. Distance can be a best single predictor out all other variables for Fare.

#2. Explore the categorical predictors (excluding the first 4) by computing the
percentage of flights in each category. Create a pivot table that gives the average
fare in each category.
Solution:
Column
Labels
Controlled
Count of
SLOT

Free

182

percentage

456
71.47
%

28.53%

Grand
Total
638
100%

This says that in 71.42% of all cases endpoint airport is free and 28.53% is
slot controlled.
Column
Labels
No

Yes

Count of
SW

444

Percentile

69.59%

194
30.41
%

Grand
Total
638
100%

This says that only 30.41% of all air routes is served by South West Airways
and 69.59% of all air routes are served by remaining.
Column
Labels
No

Yes

Count of
VACATION

468

percentage

73.35%

170
26.65
%

Grand
Total
638
100%

This says that in 26.65% of all routes is Vacation route and 73.35% is not a
Vacation route.
Column
Labels

Constrained
Count of
GATE
percentage

124
19.44%

Free
514
80.56%

Grand
Total
638
100%

This says that in 80.56% of all cases endpoint airport is free and 28.53% is
GATE constrained.

Row
Average of
Labels
FARE
No
188.1827928
98.3822680
Yes
4
Grand
160.876677
Total
1

VACATION
Average of
FARE
173.5525
125.980882
Yes
4
Grand
160.876677
Total
1
Row
Labels
No

Row
Labels
Controlle
d
Free
Grand
Total

SLOT
Average of
FARE
186.059395
6
150.8256798
160.876677
1

Row
Labels
Constrain
ed

Gate
Average of
FARE
193.129032
3

Free
Grand
Total

153.0959533
160.876677
1

route
Solution:

Categorical
predictor
Vacation
SW
Gate

Slot

Yes=1
No=0
Yes=1
No=0
Free=1
Controlled=
0
Free=1
Constrained
=0

For predicting the fare the

o dependent variable is Fare
o Independent variables are Vacation, New, SW,
Slot, Gate, E_POP, S_POP, E_INCOME, HI,
S_INCOME, PAX, COUPON, DISTANCE

Model Summary

Model

Square

Estimate

R Square
.887a

.787

.782

35.46850

a. Predictors: (Constant), PAX, NEW, GATE, VACATION, DISTANCE,

S_INCOME, SLOT, HI, E_INCOME, SW, S_POP, E_POP, COUPON

Inferences from Model Summary:

1. Adjusted R Square=78.2%
It says that 78.2% of the variances in fare can be predicted by the independent
variables.

ANOVAb
Model
1

Sum of Squares
Regression
Residual
Total

df

Mean Square

2896483.535

13

222806.426

785001.174

624

1258.015

3681484.709

637

Sig.

177.110

.000a

a. Predictors: (Constant), PAX, NEW, GATE, VACATION, DISTANCE, S_INCOME, SLOT, HI,
E_INCOME, SW, S_POP, E_POP, COUPON
b. Dependent Variable: FARE

Inferences from ANOVA Table:

The result of F-test says that
Null Hypothesis: The model is not helpful in predicting the Fare using the independent Variables.
But P value is less than the 0.05 (P<0.05)
Decision: Reject the Null Hypothesis. The model is significant in predicting the fare of Airlines.

Coefficientsa
Standardized
Unstandardized Coefficients
Model
1

Std. Error

(Constant)

12.699

27.379

COUPON

3.755

12.194

-2.396

VACATION
SW

Coefficients
Beta

Sig.
.464

.643

.010

.308

.758

1.875

-.024

-1.277

.202

-35.644

3.617

-.207

-9.855

.000

-40.970

3.744

-.248

-10.944

.000

HI

.008

.001

.191

8.510

.000

S_INCOME

.001

.001

.057

2.334

.020

E_INCOME

.001

.000

.083

3.666

.000

S_POP

3.401E-6

.000

.135

5.213

.000

E_POP

4.363E-6

.000

.157

5.781

.000

SLOT

-16.245

3.847

-.097

-4.223

.000

GATE

-20.579

4.002

-.107

-5.143

.000

DISTANCE

.075

.004

.637

20.948

.000

PAX

.000

.000

-.151

-5.969

.000

NEW

a. Dependent Variable: FARE

Inferences:
For a positive coefficient of Independent Variable (beta), the fare
increase with increase in value of Independent Variable. The
amount of change is dependent on beta.
For a negative coefficient of Independent Variable (beta), the fare
decreases with increase in value of Independent Variable. The
amount of change is dependent on beta.

Fare predicted for next quarter

^
F = 12.699+3.755*Coupon-2.396*New-35.644*Vacation40.970*SW+0.008*HI+0.001*S_INCOME+0.001*E_INCOME+
(3.401E-6)*S_POP+ (4.363E-6)*E_POP-16.245*Slot20.579*Gate+0.075*Distance+0.00*PAX

#4. Using model (3), predict the average fare on a route with the
following characteristics:
COUPON=1.202, NEW=3, VACATION=No, SW=No, HI=4442.141, S
INCOME
= \$28,760, E INCOME=\$27,664, S POP=4,557,004, E
POP=3,195,503, SLOT=Free,
GATE=Free, PAX=12782, DISTANCE=1976 miles.
Solution:

^
F = 12.699+3.755*1.202-2.396*3-35.644*040.970*0+0.008*4442.141+0.001*28760+0.001*28664+
(3.401E-6)*4557004+(4.363E-6)*3195503-16.245*120.579*1+0.075*1976+0.00*12782
^
F=\$114.64