Вы находитесь на странице: 1из 7

Multiple Linear Regression

Case Study
Case- Multivariate Linear Regression
2
Adam, an Analytics consultant works with First Auto Insurance company. His
manager gave him data having Loss amount and policy related information
and asked him to identify and quantify the factors responsible for losses in a
multivariate fashion. Adam has no knowledge of running a multivariate
regression.
Now suppose, he approaches you and request for your help to complete the
assignment. Lets help Adam in carrying out the multivariate regression.
Case- Multivariate Linear Regression
(Rules of Thumb)
In due course of helping Adam to complete his task, we will walk him through following steps:

Variable identification
Identifying the dependent (response) variable.
Identifying the independent (explanatory) variables.
Variable categorization (e.g. Numeric, Categorical, Discrete, Continuous etc.)
Creation of Data Dictionary
Response variable exploration
Distribution analysis
Percentiles
Variance
Frequency distribution
Outlier treatment
Identify the outliers/threshold limit
Cap/floor the values at the thresholds
Independent variables analyses
Identify the prospective independent variables (that can explain response variable)
Bivariate analysis of response variable against independent variables
Variable treatment /transformation
Grouping of distinct values/levels













3
Case- Multivariate Linear Regression
Heteroskedasticity
Check in a univariate manner by individual variables
Easy for univariate linear regression. Can be done manually.
Too cumbersome to do manually for multivariate case
The tools (SPSS, R, SAS etc.) have in-built features to tackle it.
Fitting the regression
Check for correlation between independent variables
This is to take care of Multicollinearity
Fix Heteroskedasticty
By suitable transformation of response variable a bit tricky).
Using inbuilt features of statistical packages like R
Variable selection
Check for the most suitable transformed variable
Select the transformation giving the best fit
Reject the statistically insignificant variables
Fitting the regression
Analysis of results
Model comparison
Model performance check
R
2
Lift/Gains chart and Gini coefficient
Actual vs Predicted comparison











4
Multivariate Linear Regression- Data
Data description (known
facts):
Auto insurance policy data
Contains policy holders and
loss amount information
(variables)
Policy Number
Age
Years of Driving Experience
Number of Vehicles
Gender
Married
Vehicle Age
Fuel Type
Losses
(Dependent/Response
Variable)
Next step
Create the Data
Dictionary








5
Snapshot of the data
Multivariate Linear Regression- Data
Dictionary
6
Sl # Variable Name Variable Description Values Stored Variable Type
1 Policy Number Unique Policy Number ? ?
2 Age Age of Policy holder ? ?
3
Years of Driving
Experience
Years of Driving Experience of
the Policy holder
? ?
4 Number of Vehicles
Number of Vehicles insured
under the policy
? ?
5 Gender Gender of the Policy holder ? ?
6 Married
Marital status of the Policy
holder
? ?
7 Vehicle Age
Age of vehicle insured under
the policy
? ?
8 Fuel Type Fuel type of the vehicle insured ? ?
9 Losses
Insurance amount claimed
under the policy
? ?
Multivariate Linear Regression- Data
Dictionary
7
Sl # Variable Name Variable Description Values Stored Variable Type
1 Policy Number Unique Policy Number
Unique value identifying
the policy
Identifier
2 Age Age of Policy holder 16, 17,,70 Numerical (Discrete)
3
Years of Driving
Experience
Years of Driving Experience of
the Policy holder
0,1,.,53 Numerical (Discrete)
4 Number of Vehicles
Number of Vehicles insured
under the policy
1,2,3,4 Numerical (Discrete)
5 Gender Gender of the Policy holder F, M Categorical (Binary)
6 Married
Marital status of the Policy
holder
Married, Single Categorical (binary)
7 Vehicle Age
Age of vehicle insured under
the policy
0,1,,15 Numerical (Discrete)
8 Fuel Type Fuel type of the vehicle insured D, P Categorical (Binary)
9 Losses
Loss amount claimed under the
policy
Range: 13- 3500 Numerical (Continuous)

Оценить