Академический Документы
Профессиональный Документы
Культура Документы
Group 1
Abhinav Aravabhumi
Megha Nellutla
Kavya Chandershekar
Prateek Sahu
1. Reading books.txt, generating the count dataset and printing the first 10 records.
Solution:
Code for reading the text file and the subsequent steps
data barnesnoble;
set project.books;
if domain = "barnesandnoble.com";
run;
Data barnesnoblesum;
set barnesnoblesum (drop = _TYPE_ _FREQ_);
if userid = . then delete;
run;
2. Build an NBD model, ignoring the demographic variables. Report your results.
Solution:
data amazonbookcount;
set project.books;
if domain = "amazon.com";
run;
Data amazonbookssum;
set amazonbookssum (drop = _TYPE_ _FREQ_);
if userid = . then delete;
run;
/* merging barnesandnoble count dataset with amazon count dataset */
data bothbooks;
merge amazonbookssum barnesnoblesum;
by userid;
if NumBooks = . then NumBooks = 0;
run;
data nbdmodel;
set nbdmodel (drop= _TYPE_ _FREQ_);
if NumBooks = . then delete;
run;
Results:
Code for NBD model
Optimal values of r and alpha are 0.1299 and 0.09723 respectively. Below is the screenshot of
the fit statistics and parameter estimates.
3. Calculate the values of (i) Reach, (ii) Average Frequency, and (iii) Gross Ratings Points
(GRPs) based on the NBD Model. Show your work.
Solution:
Solution:
Data poissonbooks;
set bothbooks (drop=NumBooksamazon);
run;
data poissonbooks;
set poissonbooks;
if region='*' then region=.;
run;
/* building Poisson Regression Model */
Results:
Managerial Takeaways:
5. Next, we start the setup for developing an NBD regression model. What is the formula
for the log-likelihood expression, LL?
Solution:
LL=log((gamma(r+NumBooks)/(gamma(r)fact(NumBooks)))*((alpha/(alpha+expBx))**r)*((exp
Bx
/(alpha+expBx))**NumBooks))
Where expBx =
exp((b1*region)+(b2*hhsz)+(b3*age)+(b4*income)+(b5*child)+(b6*race)+(b7*country))
6. Build a NBD regression model using the demographic information provided. Report
your results. What are the managerial takeaways | which customer characteristics seem to
be important?
Optional: As with the Poisson regression, you have the flexibility in choosing the variables
to include | if you wish to do so, you can choose to eliminate some (via feature selection, for
example) or create new ones (from the variables you have available | for example, fraction
of weekend purchases). This is optional for this project, but if you do anything along these
lines, please provide your justification.
Solution:
Managerial Takeaways:
7. Are there any significant differences between the results from the Poisson and NBD
regressions? If so, what exactly is the difference? Discuss what you believe about the
cause(s) of the difference.
Solution:
Difference between NBD regression and Poisson regression models:
By comparing the fit statistics of both models, we can find the differences
Poisson Regression model Fit Statistics
To compare models, all we need are Log-likelihood and Bayesian values of both models.We can
infer the below by looking at the BIC and log-likelihood values of the above models.
● Log-likelihood value for poisson regression model is -18834 and BIC value is 37751
where as the log-likelihood value for NBD regression model is -8364.5 and BIC value is
16820. Generally, higher the log-likelihood value, the better is the model and lower the
BI value, the better is the model but in this case specifically, the results are contrasting
but NBD model overall is a better fit.
● Other difference is that region and race are the only significant variables in NBD
regression in determining the number of visits to the website whereas in poisson
regression model, the purchasing behavior directly depends on age, region, income, child,
racee. The variables that are not significant in NBD model are education, country, age,
household size, child,income whereas the variables insignificant in poisson regression
model is household size.
● The causes of the difference in these two models could be due to the fact that the
assumptions of each model are different. Poisson distribution assumes that mean and
variance are same. Sometimes, the data shows variation greater than the mean which
leads to overdispersion and negative binomial regression is more flexible than Poisson
regression .The NBD model has shape and rate parameters that adjusts for the variance
independently from the mean. Hence, the NBD model is appropriate to model count data
in the case of overdispersion. If the variance is equal to the mean then the Poisson
regression would be more appropriate for this scenario.
● Also, another inference is that poisson model was better at predicting larger number of
books and the NBD model was better at predicting smaller number of books. However,
neither of the model was a perfect fit in observing the purchasing behavior.
8. Briefly summarize what you learned from this project. This is an open-ended question,
so please include anything you found worthwhile | relating to the modeling tool (SAS), the
modeling process, insights from the modeling, any managerial takeaways that were
insightful to you, and so on.
Solution: