Академический Документы
Профессиональный Документы
Культура Документы
## # A tibble: 6 x 4
## ID Months NoBought Purchase
## <int> <int> <int> <int>
## 1 2995 1 0 0
## 2 2996 9 1 1
## 3 2997 9 0 0
## 4 2998 28 1 0
## 5 2999 6 1 0
## 6 3000 10 0 0
## Classes 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 4 variables:
## $ ID : int 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ...
## $ Months : int 30 12 18 27 4 35 4 23 10 21 ...
## $ NoBought: int 0 0 0 1 1 0 0 0 0 0 ...
## $ Purchase: int 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 4
## .. ..$ ID : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Months : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ NoBought: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Purchase: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
For the Linear Discriminent Model created using the lda() function from MASS package, we
derive -0.0511 as the coefficient of Months and 1.5067 as the coefficient of NoBought.
Fishers Discriminent Model
##
## Descriptive Discriminant Analysis
## ---------------------------------
## $power discriminant power
## $values table of eigenvalues
## $discrivar discriminant variables
## $discor correlations
## $scores discriminant scores
## ---------------------------------
##
## $power
## cor_ratio wilks_lamb F_statistic p_values
## Months 0.020792889 0.979207111 21.191944558 0.000004691
## NoBought 0.093146548 0.906853452 102.508574564 0.000000000
##
##
## $values
## value proportion accumulated
## DF1 0.115 100.000 100.000
##
##
## $discrivar
## DF1
## constant 0.09878
## Months -0.05120
## NoBought 1.50670
##
##
## $discor
## DF1
## Months -0.4339
## NoBought 0.9183
##
##
## $scores
## z1
## 1 -1.4371
## 2 -0.5156
## 3 -0.8227
## 4 0.2232
## 5 1.4007
## 6 -1.6931
## ...
Fisher’s model gives the coefficients of variables directly and also the output tells the
importance of variables by the correlation value between the model and the varibales.
In the output,the p-values for both the variables are very less. Hence, we conclude that both
of them are statistically significant and they will be able to separate the customers into
buying and not buying of the book.
The correlation ratio values in $power tells that, the NoBought is 4-5 times more important
than Months in statisically separating and classifying the customers into buying and not
buying the book.
With the coefficients there is also a constant value which is nothing but the cut-off value
used similar to the intercept in regression. Hence the equation is,
Z = 0.0987 -0.05120(Months) + 1.50670(NoBought)
From this equation z score for each customer is calculated and their class is predicted. The
correlation value shows that Months is 43% correlated with the model and NoBought is
91% correlated with the model. With this we infer that the no. of books purchased is far
more important than no.of months since last purchase in predciting the customers.
Mahalanobis discriminant model gives the output by two separate equations for buying
and not buying the book. From the eqaution , we can compute the score for each record for
both buying and not buying of the book.
For buying : -1.552 + 0.201(Months) + 0.858(NoBought)
For not buying : -4.551 + 0.135(Months) + 2.802(NoBought)
Thus, two scores are calculated for each record from the above equations and the record is
classified into a class with highest score.
The model works well with test data also and the overall accuracy of the model here is also
90.9%. But, we face the problem of model predicting 1 as 1 very less. Since, it is an
unbalanced class data, a proper cut-off value to be chosen based on business intuition.