Академический Документы
Профессиональный Документы
Культура Документы
SARADA SARIPALLI
IMDB Movie dataset analysis to predict movie success rate.
Purpose: To do Linear regression analysis on the data with 12 variables to predict the Movie
rating.
1. which of them are critical in telling the IMDB rating of a movie.
2. Is there any correlation between genre & IMDB rating & IMDB rating, director name &
IMDB rating and runtime & IMDB rating.
3. Predict the IMDB Score using model.
Filename : IMDB-Movie-Data.csv.
Source : https://www.kaggle.com/kerneler/starter-idmb-movie-76b7f8b7-1
1000 movie titles
Dataset has 12 columns and 1000 rows.
IMDB Schema details.
root
|-- Rank: integer (nullable = true)
|-- Title: string (nullable = true)
|-- Genre: string (nullable = true)
|-- Description: string (nullable = true)
|-- Director: string (nullable = true)
|-- Actors: string (nullable = true)
|-- Year: string (nullable = true)
|-- Runtime (Minutes): string (nullable = true)
|-- Rating: string (nullable = true)
|-- Votes: string (nullable = true)
|-- Revenue (Millions): double (nullable = true)
|-- Metascore: double (nullable = true)
Title and Content Layout with SmartArt
Filename : IMDB-Movie-Data.csv.
Source : https://www.kaggle.com/kerneler/starter-idmb-movie-76b7f8b7-1
1000 movie titles
Dataset has 12 columns and 1000 rows.
Read data into dataset.
Step 2 Data Cleaning & Exploration
The dataset contains 12 columns with the first 11 columns being the features that
determine the rating(the last column) of a movie.
We create transformers and assemble first 11 columns into vector.
Linear regression models can only operate on numeric data, we need to transform those
features which are categorical into numerical data.
Create estimator linear regression model.
Create pipeline and run estimator.
Step 4 Making Predictions
Use the Regression Evaluator to test how well our model performed.
Find RMSE.
Conclusion :