Project Report - Advanced - Stats - Final PDF

PROJECT 2 - ASSIGNMENT
Factor Hair Revised

Submitted by: Bibin Vadakkekara Bhaskaran (G1 - PGP BABI)
Great Lakes Institute of Management

Advanced Statistics
Table of Contents
List of Tables ................................................................................................................................... 0
List of Figures .................................................................................................................................. 0
1. Project Objective, Background ............................................................................................... 1
2. Methodology........................................................................................................................... 1
3. Solutions/Question Answer (Rubric Based) ............................................................................ 1
3.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs .................................. 1
3.2 EDA - Check for Outliers and missing values and check the summary of the dataset .... 3
3.3 Check for Multicollinearity - Plot the graph based on Multicollinearity ......................... 4
3.4 Simple Linear Regression (with every variable) ............................................................... 6
3.5 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule) ...... 6
3.6 Output Interpretation Tell why only 4 factors are being asked in the questions and tell
whether it is correct in choosing 4 factors. Name the factors with correct explanations ......... 9
3.7 Create a data frame with a minimum of 5 columns, 4 of which are different factors and
the 5th column is Customer Satisfaction .................................................................................... 9
3.8 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent
Variable and the four factors as Independent Variables .......................................................... 10
3.9 MLR summary interpretation and significance (R, R2, Adjusted R2, Degrees of
Freedom, f-statistic, coefficients along with p-values)............................................................. 11
3.10 Output Interpretation <making it meaningful for everybody> .................................. 12
Appendix 1 – Source Code ............................................................................................................ 13
List of Tables
Table 1 : Outlier Values & Identification ....................................................................................................... 4
Table 2 : VIF Variables & Values .................................................................................................................... 4
Table 3 :Variable Correlation Table ............................................................................................................... 5
Table 4 : SLR - Equation Table _ Independent variables ............................................................................... 6
Table 5 : Eigen values - Independent variables ............................................................................................. 7
Table 6 : Data frame with 4 factors & Customer Satisfaction ..................................................................... 10
Table 7 : Multiple Linear Regression Coeff. - Values ................................................................................... 11
Table 8 : Coefficients vs P Values................................................................................................................. 11
List of Figures
Figure 1 : Histogram of Customer Satisfaction .............................................................................................. 1
Figure 2 : Box plot of Customer Satisfaction ................................................................................................. 2
Figure 3 :Histogram - Independent variables ................................................................................................ 2
Figure 4 : Scatter Plot - Bi Variate Analysis - Independent variables ............................................................ 3
Figure 5 : Box plots - Independent variables ................................................................................................. 3
Figure 6 : Correlation Matrix Independent variables .................................................................................... 5
Figure 7 : Scree Plot ....................................................................................................................................... 7
Figure 8 : Factor Analysis Diagram - Without Rotation ................................................................................. 8
Figure 9 : Factor Analysis Diagram - with Rotation ....................................................................................... 8
Figure 10 : Factor analysis diagram - Cross comparison (before and after rotation) ................................... 9
Figure 11 : Correlation Matrix - with 4 factors ............................................................................................ 10
1. Project Objective, Background
The objective of the project is to use the dataset 'Factor-Hair-Revised.csv' to build an optimum regression
model to predict satisfaction.
2. Methodology
An exploratory data analysis on the dataset will be performed with charts & graphs continued with a check
for outliers and missing values. Multi-collinearity check shall be done in the dataset after which a simple
linear regression shall be performed for the dependent variable with every independent variable. A
PCA/Factor analysis based by extracting 4 factors shall be conducted and the factor’s/dimensions will be
named accordingly. After PCA, a multiple linear regression model with customer satisfaction as the
dependent variable will be made.
3. Solutions/Question Answer (Rubric Based)

3.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs
• Primary data consists of 100 rows and 13 columns
• Column names indicate an ID column – Since it is not providing any valuable insight into
the data –a new dataset is generated without the “ID” column
• The Structure of the data set indicates that the data is in numeric form – hence no need to
change the data type
• The Summary of data indicates that there is no need to change the scale of the data as well
– however column names need to be changed to meaningful wordings.
• Univariate analysis – Histogram of customer satisfaction (Ref. Figure 1) shows an almost
normal distribution of data – a bi modal shape can be observed
Figure 1 : Histogram of Customer Satisfaction
Page | 1
• Univariate analysis – Box plot (Ref. Figure 2) indicates that there are no outliers in the
distribution of dependant variable.
Figure 2 : Box plot of Customer Satisfaction
• Univariate analysis of all independent variables (Ref. Figure 3) indicates and almost normal
distribution with the exception for the variable “Warranty & Claims” which is seen to be
slightly skewed to the right.
Figure 3 :Histogram - Independent variables
Page | 2
• Bivariate analysis (Ref. Figure 4) between the dependant variable and independent
variables show certain +ve and –ve relations between the variables. The scale is set to
common values to identify the clustering in a uniform manner. A trend is inserted into the
scatter plots using a simple linear regression model calculation.
•
Figure 4 : Scatter Plot - Bi Variate Analysis - Independent variables
3.2 EDA - Check for Outliers and missing values and check the summary of the dataset
• Box plots (Ref. Figure 5)are plotted to understand the 5-point summary in a graphical
manner, multiple box plots in a single representation indicates the presence of some
outliers in the data, namely within the variables – “E-commerce, Salesforce Image, Order
& Billing, Delivery Speed”
Figure 5 : Box plots - Independent variables
Page | 3
• This is manually found within the data using the boxplot(<dataset>)$out command.
• The following table (Ref. Table 1) represents the outliers within the data
Table 1 : Outlier Values & Identification
Variable Name Outlier values Outlier Row Index numbers

E-Commerce 5.6, 5.7, 5.1, 5.1, 5.1, 5.5 13, 22, 43, 44, 57, 90
Salesforce Image 7.8, 7.8, 8.2 22, 44, 90
Order & Billing 6.7, 6.5, 2.0, 2.0 24, 48, 84, 92
Delivery Speed 1.6 84
• It can be observed that in the first two variables the outliers are on the upper side of the
mean value, for Order & Billing two outliers are on the upper side and two on the lower &
for Delivery speed there is only one outlier on the lower side.
• There are no missing values in the dataset
• Thus, from the summary of the dataset it is evident that the people have rated Delivery
speed as the least considerable factor followed by Advertising, whereas Product quality,
Warranty & Claims & Competitive pricing are highly considerable factors with the
minimum value starting from 5, 4.1 & 3.7 respectively. This indicates that people value
these attributes highly.
3.3 Check for Multicollinearity - Plot the graph based on Multicollinearity

• A variance inflation factor test (VIF) and a Bartlett’s test for homogeneity is performed as
check for multi collinearity.
• The VIF values are in the table below (Ref. Table 2)
Table 2 : VIF Variables & Values
Variable VIF Value

Product Quality 1.635797
E-Commerce 2.756694
Technical Support 2.976796
Complaint Resolution 4.730448
Advertising 1.508933
Product Line 3.488185
Salesforce Image 3.439420
Competitive Pricing 1.635000
Warranty & Claims 3.198337
Order & Billing 2.902999
Delivery Speed 6.516014
• From the VIF table, there is high correlation between the independent variables.
Page | 4
• The Bartlett’s test checks the homogeneity of variances in the data, considering a
significance level (alpha) of 0.5, the null hypothesis states that the variance is the same for
all independent variables. From the test the P-value of 1.65971e-120 implies that we fail to
reject the null hypothesis, in other words the variance is not same for the independent
variables. The degrees of freedom is 55 and chi-square value of 619.2726.
• Graphically a correlation plot (Ref. Figure 6) is generated to confirm the relations between
independent variables.
Figure 6 : Correlation Matrix Independent variables
• From the plot, evidence of significant relation between variable exists. Table 3 represents
the highest relation variables extracted from plot (Ref. Figure 6)
Table 3 :Variable Correlation Table
Variable 1 Variable 2 Relation Coeff.

E Commerce Salesforce Image 0.79
Technical Support Warranty & Claims 0.8
Complaint Resolution Order & Billing 0.76
Complaint Resolution Delivery Speed 0.87
Order & Billing Delivery Speed 0.75
• This graph along with the VIF and Bartlett’s test indicate the strong presence of Multi-
collinearity among the independent variables.
Page | 5
3.4 Simple Linear Regression (with every variable)
• A simple linear regression model is built with the assumption that there is no correlation
between the independent variables.
• A for loop is used to generate the multiple models (Ref. Table 4) (models are represented
in the form of equations in the table below). Equation format: Dependant Variable = b0 +
b1*x
Table 4 : SLR - Equation Table _ Independent variables
Customer Satisfaction
X bo b1
= (SLR Model)
Product Quality 3.6759 0.4151 3.6759 + 0.4151 * x
E-Commerce 5.1516 0.4811 5.1516 + 0.4811 * x
Technical Support 6.44757 0.08768 6.44757 +0.08768 * x
Complaint 3.680 0.595 3.680+ 0.595* x
Resolution
Advertising 5.6259 0.3222 5.6259 + 0.3222 * x
Product Line 4.0220 0.4989 4.0220 + 0.4989 * x
Salesforce Image 4.070 0.556 4.070 + 0.556 * x
Competitive 8.0386 -0.1607 8.0386 - 0.1607B * x
Pricing
Warranty & 5.3581 0.2581 5.3581 + 0.2581 * x
Claims
Order & Billing 4.0541 0.6695 4.0541 + 0.6695 * x
Delivery Speed 3.2791 0.9364 3.2791 + 0.9364 * x
3.5 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule)
• Kaiser Normalization rule - drop all components with eigenvalues under 1.0
• Eigen Values are calculated numerically and represented on a Scree-plot (Ref. Figure 7) for
graphical interpretation.
• Eigen value table (Ref. Table 5) is as below
Page | 6
Table 5 : Eigen values - Independent variables
Independent Variable Eigen Value

E-Commerce 2.550896712
Advertising 0.609424095
Product Line 0.551883778
Figure 7 : Scree Plot
• The eigen values indicate that there is a high possibility of dimension reduction 4 variables
namely Product Quality, E-Commerce, Technical Support & Complaint Resolution have high
eigen values (>1), according to Kaiser Normalisation rule those variables with eigen values
below 1 can be dropped.
• The reduction into 4 factors is represented by the scree plot (Ref. Figure 7) in which there are
4 points/dimensions above the critical value of 1.
Page | 7
• A factor analysis diagram (Ref. Figure 8) is created without Rotating the data
Figure 8 : Factor Analysis Diagram - Without Rotation
• Figure 8 indicates that the variables Delivery Speed, Complaint Resolution, Order & Billing,
Product line needs to be combined into a single component PA1, similarly PA2, PA3. However,
PA 4 contains only one variable Product Quality, there is a need to capture more independent
variables in PA4 and hence orthogonal rotation is performed in the data.
Figure 9 : Factor Analysis Diagram - with Rotation
• Now the independent variables are seen to be grouped into components (Ref. Figure 9) by
which a similarity of nature can be observed. Product line from PA1 and Competitive pricing
Page | 8
from PA2 is grouped into PA 4 after rotation. A comparison of before and after rotation can be
seen below (Ref. Figure 10).
Figure 10 : Factor analysis diagram - Cross comparison (before and after rotation)
3.6 Output Interpretation Tell why only 4 factors are being asked in the questions and tell whether it is
correct in choosing 4 factors. Name the factors with correct explanations
• The question asks for only 4 factors since the dimensionality reduction resulted in reducing the
11 factors into 4 groups. The Eigen values, scree plot support this decision both numerically
and graphically.
• Thus, it is correct in choosing 4 factors.
• The 4 factors are named as Salesforce Quality, Effect of Marketing, Support & After Sales
Service & Quality-Price ratio.
o Salesforce Quality – Since those three factors represent the effectiveness of the sales
force team – namely the time for delivery (mostly handled by in house of 3PL), the
solution of customer complaints & the order& billing section.
o Effect of Marketing – these variables indicate the effect of marketing on the customer
satisfaction levels. The image of salesforce, the advertising and E-commerce options
can thus be clubbed together into a single component.
o Support & After Sales Service – the variables that fall under this category are the
technical support and warranty & claims.
o Quality-Price ratio – the value for money depends on the variables within this category
such as Product line, Product Quality and the competitor’s price.
3.7 Create a data frame with a minimum of 5 columns, 4 of which are different factors and the 5th
column is Customer Satisfaction
• The data frame is created with the 4 columns and the 5th as the dependant variable. The first
6 rows of the data are shown in the table (Ref. Table 6) below.
Page | 9
Table 6 : Data frame with 4 factors & Customer Satisfaction
Support
Quality-
Salesforce Effect of & After Customer
Price
Quality Marketing Sales Satisfaction
ratio
Service
1 -0.13389 0.917517 -1.7196 0.091354 8.2
2 1.62976 -2.00901 -0.59636 0.658082 5.7
3 0.363766 0.836174 0.00298 1.375488 8.9
4 -1.22252 -0.54913 1.245473 -0.64421 4.8
5 -0.48542 -0.42762 -0.02698 0.473607 7.1
6 -0.59509 -1.30353 -1.18302 -0.95914 4.7
Figure 11 : Correlation Matrix - with 4 factors
• From the correlation plot (Ref. Figure 11) created it can be clearly seen that the factors are
independent of each other which makes this suitable for performing an MLR.
3.8 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent Variable and the
four factors as Independent Variables
• MLR is performed and the summary is as below (Ref. Table 7)
• The model generated is of the form
• Customer Satisfaction = 6.9180+ (0.57963 * Sales Force Quality) + (0.61978 * Effect of
Marketing) + (0.05692* Support & After Sales Service) + (0.61168* Quality-Price ratio)
Page | 10
Table 7 : Multiple Linear Regression Coeff. - Values
Coefficients: Estimate Std. Error t value Pr(>|t|)

Intercept 6.91800 0.06696 103.317 < 2e-16 ***
Salesforce 0.57963 0.06857 8.453 3.32e-13 ***
Quality
Effect of 0.61978 0.06834 9.070 1.61e-14 ***
Marketing
Support & 0.05692 0.07173 0.794 0.429
After Sales
Service
Quality-Price 0.61168 0.07656 7.990 3.16e-12 ***
ratio
3.9 MLR summary interpretation and significance (R, R2, Adjusted R2, Degrees of Freedom, f-statistic,
coefficients along with p-values)
• The summary table indicates that all the factors except Support & After Sales Services are
highly significant in the model.
• R squared value is 0.6971 & Adjusted R squared value is 0.6844
• Degrees of freedom – 4 & 95
• F Statistics – 54.66
• Coeff. Along with P values are in the table below (Ref. Table 8)
Table 8 : Coefficients vs P Values
Coefficients: Pr(>|t|)
Intercept < 2e-16 ***
Salesforce 3.32e-13 ***
Quality
Effect of 1.61e-14 ***
Marketing
Support & 0.429
After Sales
Service
Quality-Price 3.16e-12 ***
ratio
Page | 11
3.10 Output Interpretation <making it meaningful for everybody>
The dataset consists of ratings given by 100 people on various factors that influence the
customer satisfaction. The preliminary analysis (EDA) of data indicated that there is not much need for
techniques such as outlier correction or scaling. However, a strong multi-collinearity exists amongst
the independent variables and hence a Factor analysis is performed to reduce the dimensions. This
resulted in 4 dimensions/factors using which Multiple Linear Regression equation was generated.
The output indicates that customer satisfaction is primarily dependent upon the Salesforce
quality – which consists of 3 variables, namely Delivery Speed, Complaint Resolution and Order &
Billing. Quality – Price ratio factor consisting of Product Line, Product Quality and Competitive Pricing
comes next. This implies that customer satisfaction will vary highly if these factors.
It can be observed that Support and After Sales service is the least considerable factor w.r.t
customer satisfaction, this can be interpreted in a way that if the product is of good quality then an
after sales service will not be required for the same. Marketing of products do affect the customer
satisfaction; however, it falls behind the other two major factors. The salesforce image which is directly
linked to the components in the first factor – sales force quality affects the marketing aspect. In case
of such product companies, most of the time the sales force is handled by 3rd Parties. Their market
valuation and image will also result in customer satisfaction levels increasing – for ex. Changing a
delivery vendor to one with more area coverage and better delivery times will lead to much better
customer satisfaction levels. Also, in case of order and billing variable – the better the customer
experience generated in the website / store the better the customer satisfaction. Since E-commerce
is a variable in the data set, it is safe to assume that sales are generated through websites and hence
the User Interface/Experience of the billing and checkout section will greatly affect customer
satisfaction.
Page | 12
Appendix 1 – Source Code
> ##### Invoking libraries needed
> library(tidyverse)
> library(corrplot)
> library(psych)
> library(dplyr)
> library(car)
>
> ##### setting working directory #####
> setwd("C:/Users/bibin/OneDrive/Great Lakes/Advanced Stats/Project work Adv Stat/Work Completed")
> ##### reading the file #####
> mydata=read.csv("Factor-Hair-Revised.csv",header=TRUE)
>
> #####3.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs #####
> head(mydata)
ID ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing WartyClaim
1 1 8.5 3.9 2.5 5.9 4.8 4.9 6.0 6.8 4.7
2 2 8.2 2.7 5.1 7.2 3.4 7.9 3.1 5.3 5.5
3 3 9.2 3.4 5.6 5.6 5.4 7.4 5.8 4.5 6.2
4 4 6.4 3.3 7.0 3.7 4.7 4.7 4.5 8.8 7.0
5 5 9.0 3.4 5.2 4.6 2.2 6.0 4.5 6.8 6.1
6 6 6.5 2.8 3.1 4.1 4.0 4.3 3.7 8.5 5.1
OrdBilling DelSpeed Satisfaction
1 5.0 3.7 8.2
2 3.9 4.9 5.7
3 5.4 4.5 8.9
4 4.3 3.0 4.8
5 4.5 3.5 7.1
6 3.6 3.3 4.7
> dim(mydata)
[1] 100 13
> #getting to know the structure of the data #####
> str(mydata)
'data.frame': 100 obs. of 13 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...
> summary(mydata)
ID ProdQual Ecom TechSup CompRes Advertising
Min. : 1.00 Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900
1st Qu.: 25.75 1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175
Median : 50.50 Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000
Mean : 50.50 Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010
3rd Qu.: 75.25 3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800
Max. :100.00 Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800 Max. :6.500
ProdLine SalesFImage ComPricing WartyClaim OrdBilling DelSpeed
Min. :2.300 Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000 Min. :1.600
1st Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700 1st Qu.:3.400
Median :5.750 Median :4.900 Median :7.100 Median :6.100 Median :4.400 Median :3.900
Mean :5.805 Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278 Mean :3.886
3rd Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800 3rd Qu.:4.425
Satisfaction
Min. :4.700
1st Qu.:6.000
Median :7.050
Mean :6.918
3rd Qu.:7.625
Max. :9.900
> # ID is not needed for the analysis part, hence removing the id and creating new mydata
> names(mydata)
[1] "ID" "ProdQual" "Ecom" "TechSup" "CompRes" "Advertising"
[7] "ProdLine" "SalesFImage" "ComPricing" "WartyClaim" "OrdBilling" "DelSpeed"
[13] "Satisfaction"
> mydata1=mydata[,c(2:13)]
> head(mydata1)
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing WartyClaim OrdBilling
1 8.5 3.9 2.5 5.9 4.8 4.9 6.0 6.8 4.7 5.0
2 8.2 2.7 5.1 7.2 3.4 7.9 3.1 5.3 5.5 3.9
3 9.2 3.4 5.6 5.6 5.4 7.4 5.8 4.5 6.2 5.4
4 6.4 3.3 7.0 3.7 4.7 4.7 4.5 8.8 7.0 4.3
5 9.0 3.4 5.2 4.6 2.2 6.0 4.5 6.8 6.1 4.5
Page | 13
6 6.5 2.8 3.1 4.1 4.0 4.3 3.7 8.5 5.1 3.6
DelSpeed Satisfaction
1 3.7 8.2
2 4.9 5.7
3 4.5 8.9
4 3.0 4.8
5 3.5 7.1
6 3.3 4.7
> str(mydata1)
'data.frame': 100 obs. of 12 variables:
$ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
$ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
$ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
$ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
$ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
$ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
$ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
$ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
$ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
$ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
$ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
$ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...
> summary(mydata1)
ProdQual Ecom TechSup CompRes Advertising ProdLine
Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900 Min. :2.300
1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175 1st Qu.:4.700
Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000 Median :5.750
Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010 Mean :5.805
3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800 3rd Qu.:6.800
SalesFImage ComPricing WartyClaim OrdBilling DelSpeed Satisfaction
Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000 Min. :1.600 Min. :4.700
1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700 1st Qu.:3.400 1st Qu.:6.000
Median :4.900 Median :7.100 Median :6.100 Median :4.400 Median :3.900 Median :7.050
Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278 Mean :3.886 Mean :6.918
3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800 3rd Qu.:4.425 3rd Qu.:7.625
> colnames(mydata1)
[1] "ProdQual" "Ecom" "TechSup" "CompRes" "Advertising" "ProdLine"
[7] "SalesFImage" "ComPricing" "WartyClaim" "OrdBilling" "DelSpeed" "Satisfaction"
> #changing the coloumn names for clear understanding
> newnames=c("Product Quality","E-Commerce","Technical Support","Complaint Resolution" ,
+ "Advertising","Product Line","Salesforce Image","Competitive Pricing" ,
+ "Warranty & Claims","Order & Billing","Delivery Speed","Customer Satisfaction")
> #replacing the previous coloumn names with the new names #####
> colnames(mydata1)=c(newnames)
> colnames(mydata1)
[1] "Product Quality" "E-Commerce" "Technical Support" "Complaint Resolution"
[5] "Advertising" "Product Line" "Salesforce Image" "Competitive Pricing"
[9] "Warranty & Claims" "Order & Billing" "Delivery Speed" "Customer Satisfaction"
> summary(mydata1)
Product Quality E-Commerce Technical Support Complaint Resolution Advertising
Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900
1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175
Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000
Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010
3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800
Max. :10.000 Max. :5.700 Max. :8.500 Max. :7.800 Max. :6.500
Product Line Salesforce Image Competitive Pricing Warranty & Claims Order & Billing
Min. :2.300 Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000
1st Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700
Median :5.750 Median :4.900 Median :7.100 Median :6.100 Median :4.400
Mean :5.805 Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278
3rd Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800
Delivery Speed Customer Satisfaction
Min. :1.600 Min. :4.700
1st Qu.:3.400 1st Qu.:6.000
Median :3.900 Median :7.050
Mean :3.886 Mean :6.918
3rd Qu.:4.425 3rd Qu.:7.625
Max. :5.500 Max. :9.900
> attach(mydata1)
> #all the data values are within a same range - hence no need for any scaling process
>
> ###### histogram - Dependant variable #####
> hist(`Customer Satisfaction`,labels=T,xlim=c(4,11),ylim=c(0,20),col="turquoise",border=4)
> # box plot for the dependant variable ####
> boxplot(`Customer Satisfaction`,horizontal = TRUE, col="turquoise", main="Box Plot of Customer Satisfact
ion"
+ , xlab="Level of Customer Satisfaction",ylim=c(4,10))
>
> ##### histogram - independant variables one by one ####
Page | 14
>
> dim(mydata1)
[1] 100 12
> # 12 coloumns -> 4 rows 3 coloums division of graph space
> dev.off()
null device
1
> par("mar")
[1] 5.1 4.1 4.1 2.1
> par(mar=c(2,2,1,1))
> par(mfrow=c(4,3))
>
> hist(mydata1$`Product Quality`,main=colnames(mydata1[1]),xlab="Value",col = "turquoise",border=4)
> ## creating loop for other histograms
> mydata1[,12]
[1] 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7.0 5.5 7.4 6.0 8.4 7.6 8.0 6.6 6.4 7.4 6.8 7.6 5.4 9.9 7.0
[24] 8.6 4.8 6.6 6.3 5.4 6.3 5.4 6.1 6.4 5.4 7.3 6.3 5.4 7.1 8.7 7.6 6.0 7.0 7.6 8.9 7.6 5.5 7.4
[47] 7.1 7.6 8.7 8.6 5.4 5.7 8.7 6.1 7.3 7.7 9.0 8.2 7.1 7.9 6.6 8.0 6.3 6.0 5.4 7.6 6.4 6.1 5.2
[70] 6.6 7.6 5.8 7.9 8.6 8.2 7.1 6.4 7.6 8.9 5.7 7.1 7.4 6.6 5.0 8.2 5.2 5.2 8.2 7.3 8.2 7.4 4.8
[93] 7.6 8.9 7.7 7.3 6.3 5.4 6.4 6.4
> i=2
> for (i in c(2:11))
+ {
+ hist(mydata[,i],main =colnames(mydata1[i]),xlab = "Value",col="turquoise",border=4)
+
+ }
>
> ##### Bivariate Analysis #####
>
> #scatter plot of independant varibales vs dependant variables
> dim(mydata1)
[1] 100 12
> dev.off()
null device
1
> par("mar")
[1] 5.1 4.1 4.1 2.1
> par(mar=c(3,3,1,1))
> par(mfrow=c(4,3))
> for (i in c(1:11))
+ {
+ plot(mydata1[,i]~ mydata1$`Customer Satisfaction`,xlim=c(0,10),ylim=c(0,10),xlab=newnames[i],
+ ylab="Customer Satisfaction",col="blue")
+ abline(lm(formula = mydata1[,i]~mydata1$`Customer Satisfaction`),col="red")
+ }
>
> #####3.2 EDA - Check for Outliers and missing values and check the summary of the dataset #####
>
> #similarly Box plot with loop
> dev.off()
null device
1
> par("mar")
[1] 5.1 4.1 4.1 2.1
> par(mar=c(2,2,1,1))
> par(mfrow=c(4,3))
> for(i in c(1:12))
+ {
+ boxplot(mydata1[,i],horizontal = TRUE,xlab=colnames(mydata1[i]),las=2,main="box plot")
+ }
> #creating multiple box plots
> dev.off()
null device
1
> par(mar=c(9,3,1,1))
>
> boxplot(mydata1[,-12], las = 2, names = colnames(mydata1[-12]), cex.axis = 1,col="turquoise",border=2)
> #outliers are present in E comm, Salesforce image, Order & Billng, Del.Speed
>
> ##### finding outliers #####
> outliers.Ecom=boxplot(mydata1$È-Commerce`)$out
> which(mydata1$È-Commerce` %in% outliers.Ecom)
[1] 13 22 43 44 57 90
> outliers.Ecom
[1] 5.6 5.7 5.1 5.1 5.1 5.5
>
> outliers.Sales.F=boxplot(mydata1$`Salesforce Image`)$out
> which(mydata1$`Salesforce Image` %in% outliers.Sales.F)
[1] 22 44 90
> outliers.Sales.F
[1] 7.8 7.8 8.2
>
> outliers.ord.bil=boxplot(mydata1$Òrder & Billing`)$out
> which(mydata1$Òrder & Billing` %in% outliers.ord.bil)
Page | 15
[1] 24 48 84 92
> outliers.ord.bil
[1] 6.7 6.5 2.0 2.0
>
> outliers.del.spd=boxplot(mydata1$`Delivery Speed`)$out
> which(mydata1$`Delivery Speed` %in% outliers.del.spd)
[1] 84
> outliers.del.spd
[1] 1.6
>
> #summary of data set
> summary(mydata1)
Min. : 5.000 Min. :2.200 Min. :1.300 Min. :2.600 Min. :1.900
1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250 1st Qu.:4.600 1st Qu.:3.175
Median : 8.000 Median :3.600 Median :5.400 Median :5.450 Median :4.000
Mean : 7.810 Mean :3.672 Mean :5.365 Mean :5.442 Mean :4.010
3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625 3rd Qu.:6.325 3rd Qu.:4.800
Product Line Salesforce Image Competitive Pricing Warranty & Claims Order & Billing
Min. :2.300 Min. :2.900 Min. :3.700 Min. :4.100 Min. :2.000
1st Qu.:4.700 1st Qu.:4.500 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700
Median :5.750 Median :4.900 Median :7.100 Median :6.100 Median :4.400
Mean :5.805 Mean :5.123 Mean :6.974 Mean :6.043 Mean :4.278
3rd Qu.:6.800 3rd Qu.:5.800 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800
Delivery Speed Customer Satisfaction
Min. :1.600 Min. :4.700
1st Qu.:3.400 1st Qu.:6.000
Median :3.900 Median :7.050
Mean :3.886 Mean :6.918
3rd Qu.:4.425 3rd Qu.:7.625
Max. :5.500 Max. :9.900
>
> ##### Missing values ######
>
> sum(is.na(mydata1))
[1] 0
> #since sum is 0 it indicates that there is no missing values
>
>
> #####3.3 Check for Multicollinearity - Plot the graph based on Multicollinearity#####
>
> #####multicoll. using VIF check #####
> vifmodel=lm(mydata1$`Customer Satisfaction`~.,data=mydata1)
> mydata.vif.matrix=vif(vifmodel)
>
>
> ##### multicoll. using bartlett test #####
> cor.rel.mat = cor(mydata1[c(1:11)])
> cortest.bartlett(cor.rel.mat,100)
$chisq
[1] 619.2726
$p.value
[1] 1.79337e-96
$df
[1] 55
>
> #####creating a correlation matrix###
> dev.off()
null device
1
>
> round(cor.rel.mat,4)
Product Quality 1.0000 -0.1372 0.0956 0.1064 -0.0535
E-Commerce -0.1372 1.0000 0.0009 0.1402 0.4299
Technical Support 0.0956 0.0009 1.0000 0.0967 -0.0629
Complaint Resolution 0.1064 0.1402 0.0967 1.0000 0.1969
Advertising -0.0535 0.4299 -0.0629 0.1969 1.0000
Product Line 0.4775 -0.0527 0.1926 0.5614 -0.0116
Salesforce Image -0.1518 0.7915 0.0170 0.2298 0.5422
Competitive Pricing -0.4013 0.2295 -0.2708 -0.1280 0.1342
Warranty & Claims 0.0883 0.0519 0.7972 0.1404 0.0108
Order & Billing 0.1043 0.1561 0.0801 0.7569 0.1842
Delivery Speed 0.0277 0.1916 0.0254 0.8651 0.2759
Product Line Salesforce Image Competitive Pricing Warranty & Claims
Product Quality 0.4775 -0.1518 -0.4013 0.0883
E-Commerce -0.0527 0.7915 0.2295 0.0519
Technical Support 0.1926 0.0170 -0.2708 0.7972
Complaint Resolution 0.5614 0.2298 -0.1280 0.1404
Page | 16
Advertising -0.0116 0.5422 0.1342 0.0108
Product Line 1.0000 -0.0613 -0.4949 0.2731
Salesforce Image -0.0613 1.0000 0.2646 0.1075
Competitive Pricing -0.4949 0.2646 1.0000 -0.2450
Warranty & Claims 0.2731 0.1075 -0.2450 1.0000
Order & Billing 0.4244 0.1951 -0.1146 0.1971
Delivery Speed 0.6019 0.2716 -0.0729 0.1094
Order & Billing Delivery Speed
Product Quality 0.1043 0.0277
E-Commerce 0.1561 0.1916
Technical Support 0.0801 0.0254
Complaint Resolution 0.7569 0.8651
Advertising 0.1842 0.2759
Product Line 0.4244 0.6019
Salesforce Image 0.1951 0.2716
Competitive Pricing -0.1146 -0.0729
Warranty & Claims 0.1971 0.1094
Order & Billing 1.0000 0.7510
Delivery Speed 0.7510 1.0000
> par(mar=c(1,2,6,1))
> corrplot(cor.rel.mat, type="upper",method="number",tl.col="blue")
>
> #there are correlations between individual independant variables
>
>
> #####3.4 Simple Linear Regression (with every variable)####
>
> colnames(mydata1)
[1] "Product Quality" "E-Commerce" "Technical Support" "Complaint Resolution"
[5] "Advertising" "Product Line" "Salesforce Image" "Competitive Pricing"
[9] "Warranty & Claims" "Order & Billing" "Delivery Speed" "Customer Satisfaction"
>
> model.prod.quality=lm(`Customer Satisfaction`~`Product Quality`,data=mydata1)
>
> for (i in c(1:11))
+ {
+ model=lm(mydata1$`Customer Satisfaction`~mydata1[,i],data=mydata1)
+ print(newnames[i])
+ print(model)
+ }
[1] "Product Quality"
Call:
lm(formula = mydata1$`Customer Satisfaction` ~ mydata1[, i],
data = mydata1)
Coefficients:
(Intercept) mydata1[, i]
3.6759 0.4151
[1] "E-Commerce"
Call:
data = mydata1)
Coefficients:
5.1516 0.4811
[1] "Technical Support"

Call:
data = mydata1)
Coefficients:
6.44757 0.08768
[1] "Complaint Resolution"
Call:
data = mydata1)
Coefficients:
3.680 0.595
[1] "Advertising"
Call:
Page | 17
data = mydata1)
Coefficients:
5.6259 0.3222
[1] "Product Line"
Call:
data = mydata1)
Coefficients:
4.0220 0.4989
[1] "Salesforce Image"
Call:
data = mydata1)
Coefficients:
4.070 0.556
[1] "Competitive Pricing"
Call:
data = mydata1)
Coefficients:
8.0386 -0.1607
[1] "Warranty & Claims"
Call:
data = mydata1)
Coefficients:
5.3581 0.2581
[1] "Order & Billing"
Call:
data = mydata1)
Coefficients:
4.0541 0.6695
[1] "Delivery Speed"
Call:
data = mydata1)
Coefficients:
3.2791 0.9364
>
>
> #####3.5 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule) ####
>
> ## Check with Kaiser test (KMO)
> KMO(cor.rel.mat)
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor.rel.mat)
Overall MSA = 0.65
MSA for each item =
Product Quality E-Commerce Technical Support Complaint Resolution
0.51 0.63 0.52 0.79
Advertising Product Line Salesforce Image Competitive Pricing
0.78 0.62 0.62 0.75
Warranty & Claims Order & Billing Delivery Speed
0.51 0.76 0.67
>
> ## Eigen Value calculation
Page | 18
> cor.rel.mat
Product Quality 1.00000000 -0.1371632174 0.0956004542 0.1063700
E-Commerce -0.13716322 1.0000000000 0.0008667887 0.1401793
Technical Support 0.09560045 0.0008667887 1.0000000000 0.0966566
Complaint Resolution 0.10637000 0.1401792611 0.0966565978 1.0000000
Advertising -0.05347313 0.4298907110 -0.0628700668 0.1969168
Product Line 0.47749341 -0.0526878383 0.1926254565 0.5614170
Competitive Pricing -0.40128188 0.2294624014 -0.2707866821 -0.1279543
Warranty & Claims 0.08831231 0.0518981915 0.7971679258 0.1404083
Order & Billing 0.10430307 0.1561473316 0.0801018246 0.7568686
Delivery Speed 0.02771800 0.1916360683 0.0254406935 0.8650917
Product Quality -0.05347313 0.47749341 -0.15181287 -0.40128188
E-Commerce 0.42989071 -0.05268784 0.79154371 0.22946240
Technical Support -0.06287007 0.19262546 0.01699054 -0.27078668
Complaint Resolution 0.19691685 0.56141695 0.22975176 -0.12795425
Advertising 1.00000000 -0.01155082 0.54220366 0.13421689
Product Line -0.01155082 1.00000000 -0.06131553 -0.49494840
Salesforce Image 0.54220366 -0.06131553 1.00000000 0.26459655
Competitive Pricing 0.13421689 -0.49494840 0.26459655 1.00000000
Warranty & Claims 0.01079207 0.27307753 0.10745534 -0.24498605
Order & Billing 0.18423559 0.42440825 0.19512741 -0.11456703
Delivery Speed 0.27586308 0.60185021 0.27155126 -0.07287173
Product Quality 0.08831231 0.10430307 0.02771800
E-Commerce 0.05189819 0.15614733 0.19163607
Technical Support 0.79716793 0.08010182 0.02544069
Complaint Resolution 0.14040830 0.75686859 0.86509170
Advertising 0.01079207 0.18423559 0.27586308
Product Line 0.27307753 0.42440825 0.60185021
Salesforce Image 0.10745534 0.19512741 0.27155126
Competitive Pricing -0.24498605 -0.11456703 -0.07287173
Warranty & Claims 1.00000000 0.19706512 0.10939460
Order & Billing 0.19706512 1.00000000 0.75100307
Delivery Speed 0.10939460 0.75100307 1.00000000
> cor.rel.mat
Product Quality 1.00000000 -0.1371632174 0.0956004542 0.1063700
E-Commerce -0.13716322 1.0000000000 0.0008667887 0.1401793
Technical Support 0.09560045 0.0008667887 1.0000000000 0.0966566
Complaint Resolution 0.10637000 0.1401792611 0.0966565978 1.0000000
Advertising -0.05347313 0.4298907110 -0.0628700668 0.1969168
Product Line 0.47749341 -0.0526878383 0.1926254565 0.5614170
Competitive Pricing -0.40128188 0.2294624014 -0.2707866821 -0.1279543
Warranty & Claims 0.08831231 0.0518981915 0.7971679258 0.1404083
Order & Billing 0.10430307 0.1561473316 0.0801018246 0.7568686
Delivery Speed 0.02771800 0.1916360683 0.0254406935 0.8650917
Product Quality -0.05347313 0.47749341 -0.15181287 -0.40128188
E-Commerce 0.42989071 -0.05268784 0.79154371 0.22946240
Technical Support -0.06287007 0.19262546 0.01699054 -0.27078668
Complaint Resolution 0.19691685 0.56141695 0.22975176 -0.12795425
Advertising 1.00000000 -0.01155082 0.54220366 0.13421689
Product Line -0.01155082 1.00000000 -0.06131553 -0.49494840
Salesforce Image 0.54220366 -0.06131553 1.00000000 0.26459655
Competitive Pricing 0.13421689 -0.49494840 0.26459655 1.00000000
Warranty & Claims 0.01079207 0.27307753 0.10745534 -0.24498605
Order & Billing 0.18423559 0.42440825 0.19512741 -0.11456703
Delivery Speed 0.27586308 0.60185021 0.27155126 -0.07287173
Product Quality 0.08831231 0.10430307 0.02771800
E-Commerce 0.05189819 0.15614733 0.19163607
Technical Support 0.79716793 0.08010182 0.02544069
Complaint Resolution 0.14040830 0.75686859 0.86509170
Advertising 0.01079207 0.18423559 0.27586308
Product Line 0.27307753 0.42440825 0.60185021
Competitive Pricing -0.24498605 -0.11456703 -0.07287173
Warranty & Claims 1.00000000 0.19706512 0.10939460
Order & Billing 0.19706512 1.00000000 0.75100307
Delivery Speed 0.10939460 0.75100307 1.00000000
> eigen1 = eigen(cor.rel.mat)
> eigen.values = eigen1$values
> eigen.values
[1] 3.42697133 2.55089671 1.69097648 1.08655606 0.60942409 0.55188378 0.40151815 0.24695154
[9] 0.20355327 0.13284158 0.09842702
> write.csv(eigen.values, "eigen2.csv")
> ## Ploting scree plot and adding lines.
> dev.off()
null device
1
Page | 19
> plot(eigen.values, main = "Scree Plot of Eigen Values", xlab = "No. of Factors", ylab = "E.Values", col
= "red",pch=20,bg="red",lwd=2,cex=2)
> lines(eigen.values, col = "blue",lwd = 2)
> abline(h = 1, col = "red",lwd = 2)
>
> #4-factor dimension reduction is possible from the scree plot
> ## Non Rotating - 1
>
> non.rotate.four.factors = fa(r= mydata1[c(1:11)], nfactors =4, rotate ="none", fm ="pa")
> print(non.rotate.four.factors)
Factor Analysis using method = pa
Call: fa(r = mydata1[c(1:11)], nfactors = 4, rotate = "none",
fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
Product Quality 0.20 -0.41 -0.06 0.46 0.42 0.576 2.4
E-Commerce 0.29 0.66 0.27 0.22 0.64 0.362 2.0
Technical Support 0.28 -0.38 0.74 -0.17 0.79 0.205 1.9
Complaint Resolution 0.86 0.01 -0.26 -0.18 0.84 0.157 1.3
Advertising 0.29 0.46 0.08 0.13 0.31 0.686 1.9
Product Line 0.69 -0.45 -0.14 0.31 0.80 0.200 2.3
Salesforce Image 0.39 0.80 0.35 0.25 0.98 0.021 2.1
Competitive Pricing -0.23 0.55 -0.04 -0.29 0.44 0.557 1.9
Warranty & Claims 0.38 -0.32 0.74 -0.15 0.81 0.186 2.0
Order & Billing 0.75 0.02 -0.18 -0.18 0.62 0.378 1.2
Delivery Speed 0.90 0.10 -0.30 -0.20 0.94 0.058 1.4
PA1 PA2 PA3 PA4

SS loadings 3.21 2.22 1.50 0.68
Proportion Var 0.29 0.20 0.14 0.06
Cumulative Var 0.29 0.49 0.63 0.69
Proportion Explained 0.42 0.29 0.20 0.09
Cumulative Proportion 0.42 0.71 0.91 1.00
Mean item complexity = 1.9
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 55 and the objective function was 6.55 with Chi Square of
619.27
The degrees of freedom for the model are 17 and the objective function was 0.33
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.03
The harmonic number of observations is 100 with the empirical chi square 3.19 with prob < 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with prob < 0.024
Tucker Lewis Index of factoring reliability = 0.921
RMSEA index = 0.096 and the 90 % confidence intervals are 0.032 0.139
BIC = -48.01
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.97 0.95 0.88
Multiple R square of scores with factors 0.96 0.95 0.91 0.78
Minimum correlation of possible factor scores 0.92 0.90 0.82 0.56
>
> #keeping the cut off at 0.3
> dev.off()
null device
1
> value.loading.non.rotate=print(non.rotate.four.factors$loadings,cutoff=0.3)
Loadings:
PA1 PA2 PA3 PA4
Product Quality -0.408 0.463
E-Commerce 0.659
Technical Support -0.381 0.738
Advertising 0.457
Product Line 0.689 -0.453 0.315
Warranty & Claims 0.379 -0.324 0.735
Delivery Speed 0.895 -0.303
PA1 PA2 PA3 PA4
SS loadings 3.215 2.223 1.499 0.678
Proportion Var 0.292 0.202 0.136 0.062
Cumulative Var 0.292 0.494 0.631 0.692
> fa.diagram(value.loading.non.rotate,main="Factor Analysis Diagram",col="Red",digits=1,rsize = 0.2,e.size
=0.09,side=1,marg=c(0.5,.5,1,0))
>
Page | 20
> ## Rotating - 1
> dev.off()
null device
1
> rotate.four.factors= fa(r= mydata1[c(1:11)], nfactors =4, rotate ="varimax", fm ="pa")
> print(rotate.four.factors)
Factor Analysis using method = pa
Call: fa(r = mydata1[c(1:11)], nfactors = 4, rotate = "varimax",
fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
Product Quality 0.02 -0.07 0.02 0.65 0.42 0.576 1.0
E-Commerce 0.07 0.79 0.03 -0.11 0.64 0.362 1.1
Technical Support 0.02 -0.03 0.88 0.12 0.79 0.205 1.0
Complaint Resolution 0.90 0.13 0.05 0.13 0.84 0.157 1.1
Advertising 0.17 0.53 -0.04 -0.06 0.31 0.686 1.2
Product Line 0.53 -0.04 0.13 0.71 0.80 0.200 1.9
Salesforce Image 0.12 0.97 0.06 -0.13 0.98 0.021 1.1
Competitive Pricing -0.08 0.21 -0.21 -0.59 0.44 0.557 1.6
Warranty & Claims 0.10 0.06 0.89 0.13 0.81 0.186 1.1
Order & Billing 0.77 0.13 0.09 0.09 0.62 0.378 1.1
Delivery Speed 0.95 0.19 0.00 0.09 0.94 0.058 1.1
PA1 PA2 PA3 PA4
SS loadings 2.63 1.97 1.64 1.37
Proportion Var 0.24 0.18 0.15 0.12
Cumulative Var 0.24 0.42 0.57 0.69
Proportion Explained 0.35 0.26 0.22 0.18
Cumulative Proportion 0.35 0.60 0.82 1.00
Mean item complexity = 1.2
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 55 and the objective function was 6.55 with Chi Square of
619.27
The degrees of freedom for the model are 17 and the objective function was 0.33
The root mean square of the residuals (RMSR) is 0.02
The df corrected root mean square of the residuals is 0.03
The harmonic number of observations is 100 with the empirical chi square 3.19 with prob < 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with prob < 0.024
Tucker Lewis Index of factoring reliability = 0.921
RMSEA index = 0.096 and the 90 % confidence intervals are 0.032 0.139
BIC = -48.01
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.99 0.94 0.88
Multiple R square of scores with factors 0.96 0.97 0.88 0.78
Minimum correlation of possible factor scores 0.93 0.94 0.77 0.55
> value.loading.rotate=print(rotate.four.factors$loadings,cutoff=0.3)
Loadings:
PA1 PA2 PA3 PA4
E-Commerce 0.787
Advertising 0.530
Product Line 0.525 0.712
Competitive Pricing -0.590
PA1 PA2 PA3 PA4
SS loadings 2.635 1.967 1.641 1.371
Proportion Var 0.240 0.179 0.149 0.125
Cumulative Var 0.240 0.418 0.568 0.692
> fa.diagram(value.loading.rotate,main="Factor Analysis Diagram - With Roatation",col="Red",digits=1,rsize
= 0.2,e.size=0.09,side=1,marg=c(0.5,.5,1,0))
>
> #comparing before and after rotation
> par(mfrow=c(1,2))
> fa.diagram(value.loading.non.rotate,main="Factor Analysis Diagram - No Rotation",col="Red",digits=1,rsiz
e = 0.2,e.size=0.05,side=1,marg=c(0.5,0.5,1,0))
> fa.diagram(value.loading.rotate,main="Factor Analysis Diagram - With Roatation",col="Red",digits=1,rsize
= 0.2,e.size=0.05,side=1,marg=c(0.5,.5,1,0))
>
> ######3.6 Output Interpretation Tell why only 4 factors are being asked in the questions and tell
whether it is correct in choosing 4 factors. ######
Page | 21
> ######3.7 Create a data frame with a minimum of 5 columns, 4 of which are different factors and t
he 5th column is Customer Satisfaction######
> #combining the scores of factor analysis after roatation with the dependant var. cust. satisfaction
>
> mydata2 = cbind(rotate.four.factors$scores,mydata1[,12])
> #check the first few rows of new table formed
> head(mydata2)
PA1 PA2 PA3 PA4
[1,] -0.1338871 0.9175166 -1.719604873 0.09135411 8.2
[2,] 1.6297604 -2.0090053 -0.596361722 0.65808192 5.7
[3,] 0.3637658 0.8361736 0.002979966 1.37548765 8.9
[4,] -1.2225230 -0.5491336 1.245473305 -0.64421384 4.8
[5,] -0.4854209 -0.4276223 -0.026980304 0.47360747 7.1
[6,] -0.5950924 -1.3035333 -1.183019401 -0.95913571 4.7
>
> ######Name the factors with correct explanations ######
>
> colnames(mydata2)
[1] "PA1" "PA2" "PA3" "PA4" ""
> factor.names=c("Salesforce Quality","Effect of Marketing","Support & After Sales Service","Quality-Price
ratio","Customer Satisfaction")
> colnames(mydata2)=factor.names
> colnames(mydata2)
[1] "Salesforce Quality" "Effect of Marketing" "Support & After Sales Service"
[4] "Quality-Price ratio" "Customer Satisfaction"
> class(mydata2)
[1] "matrix"
> mydata2=as.data.frame(mydata2)
> write.csv(head(mydata2),"mydata.csv")
> class(mydata2)
[1] "data.frame"
>
> new.cor.plot=cor(mydata2)
> dev.off()
null device
1
> corrplot(new.cor.plot,type="upper",method="number",tl.col="blue")
>
> #####3.8 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent Variable
and the four factors as Independent Variables######
>
> new.mlr.model = lm(mydata2$`Customer Satisfaction` ~., data = mydata2)
> summary(new.mlr.model)
Call:
lm(formula = mydata2$`Customer Satisfaction` ~ ., data = mydata2)
Residuals:
Min 1Q Median 3Q Max
-1.7125 -0.4708 0.1024 0.4158 1.3483
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.06696 103.317 < 2e-16 ***
`Salesforce Quality` 0.57963 0.06857 8.453 3.32e-13 ***
Èffect of Marketing` 0.61978 0.06834 9.070 1.61e-14 ***
`Support & After Sales Service` 0.05692 0.07173 0.794 0.429
`Quality-Price ratio` 0.61168 0.07656 7.990 3.16e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6696 on 95 degrees of freedom
Multiple R-squared: 0.6971, Adjusted R-squared: 0.6844
F-statistic: 54.66 on 4 and 95 DF, p-value: < 2.2e-16
>
> #checking Vif of new MLR model
> vif(new.mlr.model)
`Salesforce Quality` Èffect of Marketing` `Support & After Sales Service`
1.001021 1.002683 1.002981
`Quality-Price ratio`
1.005848
>
>
> #####3.9 MLR summary interpretation and significance (R, R2, Adjusted R2, Degrees of Freedom, f-
statistic, coefficients along with p-values)######
> new.mlr.model$coefficients
(Intercept) `Salesforce Quality` Èffect of Marketing`
6.91800000 0.57962798 0.61978029
`Support & After Sales Service` `Quality-Price ratio`
0.05692291 0.61167972
> new.mlr.model$df.residual
[1] 95
DISCLAIMER : Ideas, references fetched & generated from Stack Overflow, http://statisticshowto.datasciencecentral.com/
Page | 22

Project Report - Advanced - Stats - Final PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Project Report - Advanced - Stats - Final PDF

Загружено:

Авторское право:

Доступные форматы

PROJECT 2 - ASSIGNMENT

Factor Hair Revised

Great Lakes Institute of Management

3. Solutions/Question Answer (Rubric Based)

Figure 1 : Histogram of Customer Satisfaction

Figure 2 : Box plot of Customer Satisfaction

Figure 3 :Histogram - Independent variables

Figure 5 : Box plots - Independent variables

Variable Name Outlier values Outlier Row Index numbers

3.3 Check for Multicollinearity - Plot the graph based on Multicollinearity

Variable VIF Value

Figure 6 : Correlation Matrix Independent variables

Variable 1 Variable 2 Relation Coeff.

Independent Variable Eigen Value

Figure 7 : Scree Plot

Figure 8 : Factor Analysis Diagram - Without Rotation

Figure 9 : Factor Analysis Diagram - with Rotation

Figure 11 : Correlation Matrix - with 4 factors

Coefficients: Estimate Std. Error t value Pr(>|t|)

[1] "Technical Support"

PA1 PA2 PA3 PA4

Вам также может понравиться