Вы находитесь на странице: 1из 24

Business

Analytics
OUR ANALYSIS- USING LOGISTIC
REGRESSION

Logistic regression is a statistical


method for analyzing a dataset in which
there are one or more independent
variables that determine an outcome

The outcome is measured with a


dichotomous variable (in which there are
only two possible outcomes)
OUR ANALYSIS- USING LOGISTIC
REGRESSION

Works on the maximum likelihood


concept

Coeffients will fit the regression model


such predicted probability p(xi) of default
for each individual is as close as possible to
the individuals observed default probability
Introducti
on
INTRODUCTION

In the era of Web 2.0, consumers


share their ratings or comments easily
with other people after watching a
movie
To measure the quality of the film
- Rely on critics (More time consuming)
- Use your own instincts (Simplified and
a great indicator; unreliable)
INTRODUCTION

There should be a better way to tell if


a movie will win awards without
relying on critics or our ones instincts

This study develops an award


prediction model which uses logistic
regression technique
Question
Will a movie win an award,
considering factors like no. of
reviews, movie ratings count,
award nominations, etc.?
DATASET

1650 movies

Independent variables-IMDb Rating,Rating


Count,Number of Nominations,Number of
Photos,Number of News Articles,Number of User
Reviews,Number of Genre

Dependent variable Award win- Yes/ No

Dichotomous scale

Regression Technique- Logistic Regression


METHODOLOGY

Data collection
& streamlining

Running logistic
regression

Analyzing
results

Prediction

Accuracy check
ANALYSIS
Using the
regression
coefficients to
predict the
values on test
data and then
Divided the
checking
data into
accuracy of the
training and
model
test data

On the
training data-
logistic
regression
analysis
using R
STEP 1: Collecting the data set
STEP 2 Logistic regression on Training Data
Step 3- Prediction on test data
ACCURACY OF THE MODEL
USING ONLY SIGNIFICANT VARIABLES
ACCURACY OF THE MODEL

ERROR 0.043378995
ACCURACY 0.956621005
PREDICTED PROBABILITY OF WINNING USING LOGISTIC
REGRESSION

Classification
1.00

0.90

0.80

0.70

0.60
Probability

0.50

0.40

0.30

0.20

0.10

0.00
0 1 2 3 4 5 6 7 8 9 10
IMDB Rating
PREDICTED PROBABILITY OF WINNING USING LOGISTIC
REGRESSION

Classification
1

0.9

0.8

0.7

0.6
Probability

0.5

0.4

0.3

0.2

0.1

0
0 20 40 60 80 100 120 140
Number of Nominations
INTERPRETATION- SIGNIFICANT
VARIABLES
Significant Variables:
IMDb rating
Number of nominations

Insignificant
variables:
Rating count
Number of photos
Number of news
articles
p(x)= -9.008+(1.091*imdbRating)+(-
0.000003268*ratingCount)+(1.810*nrOfNomination)+(- Number of user
0.001*nrOfPhotos)+(- reviews
0.000049*nrOfNewsArticles)+(0.002*nrOfUserReviews Number of genres
)+(-0.066*nrOfGenre)

p(x)= --8.9192+(1.076*imdbRatin)+(1.8920*nrOfNomination)
APPLICATIONS

Movies are an integral part of modern


culture, generating over billions and billions
of dollars each year
The film industry serves as a deeply
expressive artistic medium and also has a
significant impact on the global economy
This model can be use to predict if a
movie will be critically acclaimed as an
award winning movie or not
LIMITATIONS

Many of these factors are either


impossible to quantify or difficult to
obtain reliably

- Profitability of a movies release period


- Popularity of a movies content
- Strength of a movies marketing
strategies
- Advertising budget of a movie
CONCLUSION

Logistic Regression was the optimal


solution
Emphasis on 2 dependent variables is
high
Change in IMDb Ratings and Number of
movie nominations will significantly affects
the probability of a movie winning an award
High accuracy value and lower error
value (Model strength is good )
Thank You!

Вам также может понравиться