Implementation and Evaluation of A Movie Recommender System Using Collaborative Filtering Algorithm

Implementation and evaluation of a movie recommender system using
Collaborative Filtering algorithm

Vishita Singh
College Station, Texas
vishita.singh@tamu.edu
dance of data collection techniques and ever-increasing

volumes of raw data being collected by enterprises today,
Abstract it is a big challenge to persist and process the data to get
Recommender Systems have been an active area of research meaningful business acumens. All these factors steered
since the mid-1990's and gained popularity primarily due to the creation of sophisticated data mining algorithms that
a surge in the e-commerce industry and in the entertainment help in solving the problem of collecting huge volumes of
industry in recent times. Recommender Systems use ma-
data from a large number of transactions and studying
chine learning algorithms to suggest items to a user based
on the user’s preferences. This paper aims to implement a patterns in these transactions to provide meaningful in-
recommender system using the supervised learning algo- sights into the similarities and dissimilarities in human
rithm, Collaborative Filtering (CF) through K-Nearest behaviors.
Neighbors method. The implementation was done using Recommender systems are a subset of these data min-
open-source libraries and the key challenges faced through
ing algorithms that study human behavior using past data
the process are presented in this paper. The evaluation of the
algorithm was done using different similarity indexes. The points and help make efficient decisions. These systems
quality of the similarity indexes was then evaluated using also help businesses make better recommendations and
Root Mean Square Error (RMSE), Mean Absolute Error advertise their products and solutions to their customer
(MAE), and Fraction of Concordant Pairs (FCP) evaluation base by analyzing the consumer behavior.
metrics. The results are compared, and the observations are
The recommender systems are used on multiple fronts
presented with different values of k in the K-Nearest Neigh-
bors algorithm. This paper also identifies some practical in today’s world. They predict consumer actions and
challenges while implementing a recommender system and requirements in a grocery store, recommend a user music
suggests some methods to overcome those. A literature re- records, books and movies based on personal preferences
view has also been provided on Recommender Systems us- and many more. These systems are also used by business-
ing CF and on evaluation of Recommender Systems.
es to implement customer retention techniques as these
systems help businesses study consumer interaction with
Introduction various products and solutions and make them predict
customer behavior better. The systems could also be used
Making efficient decisions consistently to improve differ- in predicting a new user’s behavior by calculating its sim-
ent aspects of life has always been a challenge for human ilarity within the existing customer base.
beings. Human mind possesses limited analytical and This paper deals with the problem of recommending
processing power. The probability of a human mind to movie choices to users based on their personal prefer-
make a correct decision reduces exponentially with a line- ences using a recommendation technique called collabora-
ar increase in dimensionality and volume of information tive filtering. Collaborative filtering is a recommendation
and data involved in making the decision. technique that uses user’s previous movie ratings and
In recent times, a growth in human interaction with the rating of similar other users to predict the affinity of this
World Wide Web has increased the scope for the study of user towards a movie entity. Although, this paper deals
human behavior patterns for better decision making. This with association of movies and users, the technique of
has led to an increase in competition for businesses to collaborative filtering is applicable to any other product
offer personalized recommendation to its customers by and item as well.
analyzing consumer behavior and traits. As a result, con- In this paper, a movie recommender system has been
sumers are also presented with never ending alternatives implemented using collaborative filtering through a data
to resolve their wants and requirements. With the abun- model called K-Nearest Neighbors (KNN) algorithm to
1
recommend movie choices to users based on their inputs industry. Recommender systems proactively predict user
and preferences. KNN algorithm works by defining preferences based on the user’s past preferences. The term
neighborhoods for similar entities, either item based, or collaborative was introduced with a commercial recom-
user based. It is assumed that the entities in the neighbor- mender system called Tapestry, which was used to provide
hoods have similar interactions and behaviors. users with recommendations on newsgroup documents
The paper provides insights into effectively training (Melville & Sindhwani, April 2010). One of the earliest
and designing a movies recommendation system for a work done in this field was by GroupLens Research
user base using the techniques of collaborative filtering. (Resnick, Iacovou, Sushak, Bergstrom, & Reidl, 1994a), a
The paper also dives deep into the implementation and research lab in the Department of Computer Science and
evaluation of these techniques using real data of Movie Engineering at the University of Minnesota. Grouplens has
Lens website made available by Group Lens Corporation made available multiple datasets like MovieLens, which
(Grouplens, 2018). The paper also highlights the pros and provides rating data sets from the MovieLens website
cons associated with Collaborative filtering techniques (Grouplens, 2018). The initial formulations for recom-
and presents problems that can and cannot be solved us- mender systems was based on statistical correlation and
ing this technique. predictive modeling rather than machine learning models
This paper can be summarized as follows - and subsequently collaborative filtering was mapped to
classification which also allowed the use of dimensionality
• Provides insights into the design, development, reduction to improve the model precision (Melville &
and training of a Movie recommender system Sindhwani, April 2010).
with collaborative filtering using Correlation and Collaborative filtering gained traction because of its
K-Nearest Neighbor methods. direct use in e-commerce and a notable mention is the Net-
• Presents an overview of the evaluation of the da- flix Prize competition for the best collaborative filter-
ta model created for the movie recommender ing algorithm to predict user ratings for films for which the
system using correlation and k-NN algorithm. company released a large dataset containing 100 million
• Highlights key challenges and provides solutions user ratings. Collaborative Filtering (CF) is now widely
while implementing a recommender system us- used in recommending movies, music, e-commerce, news,
ing open-source libraries. research papers among others (Haruna, Akmar Ismail,
Damiasih, Sutopo, & Herawan, Oct 2017). This method is
one of the most successful techniques used in recommend-
er systems which recommend items to a user based on
what similar users have rated (Haruna, Akmar Ismail,
Damiasih, Sutopo, & Herawan, Oct 2017).
Collaborative Filtering remains an active area of re-
search simply because of its use in a plethora of applica-
tions to provide personalized recommendations that assist
users in dealing with insurmountable information. There is
much work to be done in taking into consideration the user
behavior and its representation. Using not just explicit rat-
ing but also implicit user behavior, for example - a user
closed a movie within 15 minutes of starting it, is an active
Figure 1: Components of a Recommender System area of research. Advanced recommendation modeling
methods incorporate various contextual information into
the recommendation process, utilize multi-criteria ratings
and develop less intrusive systems that are more effective
Related Work in measuring system performance (Adomavicius &
Tuzhilin, June 2005).
Recommender systems truly emerged as an independent
area of study in the mid-1990’s when researchers started
explicitly focusing on the rating structure and the recom-
mender system problem was reduced to the problem of AI Approach
estimating rating for items that have not been seen by a
The fundamentals of any Recommender System is based
user (Adomavicius & Tuzhilin, June 2005). Since then a
on a combination of the type of data available in the da-
lot of work has been done in academia as well as in the
2
taset, the filtering algorithm, and the chosen model. The algorithm is that if a users’ preferences are known, a simi-
data extraction techniques used are Data Mining tech- larity is calculated with other users who have similar pref-
niques. The quality of incoming data varies by the source erences.
and nature of data (Maheshwari, 2015). Recommender The K-Nearest Neighbors algorithm (k-NN) is a non-
systems are divided into three main categories, depending parametric method used for classification and regression.
on the information used to drive the recommendations: Collaborative filtering using k-NN works by gathering user
collaborative, content-based, and hybrid filtering (Portugal, ratings of items and calculating similarities between users
Alencar, & Cowan, May 2018). Collaborative approach (Park, Park, Jung, & Lee, May 2015). Using this similarity,
provides personalized recommendations to a user based on predictions can be made for unrated items for a user. In
ratings from the user as well as other users who have simi- neighborhood-based techniques, a subset of users is chosen
lar preferences while content-based approach recommends who are highly similar to the user and who have rated the
items to a user based on previously rated similar items by item for which a prediction needs to be made for the user.
the user. The hybrid approach combines attributes of more There are several methods to calculate the similarities be-
than one type of recommendation system (Recommender tween users like adjusted cosine-based similarity, cosine-
Systems, 2014). Hybrid recommender systems can be im- based similarity and correlation-based similarity (Phorasim
plemented by making content-based and collaborative- & Yu, 2017). Because not all users rate all items, the da-
based predictions separately and then combining them or taset contains missing values and different similarity in-
by adding content-based capabilities to a collaborative- dexes treat missing values differently. It is important that
based approach. missing values can be filled in with average or modal or
Collaborative filtering (CF) is a popular recommenda- default values (Maheshwari, 2015). In this implementation,
tion algorithm that bases its predictions and recommenda- the Pearson correlation (adjusted cosine) coefficient is used
tions on the ratings or behavior of other users in the system as the similarity measure for the user-based CF algorithm
(Phorasim & Yu, 2017). CF methods can be further subdi- because it considers the weighted average rating by the
vided into memory-based and model-based approaches. similarity value from the neighborhood. The similarity,
Memory-based approach is based on literal (explicit) sim(a,n) is calculated using:
memory of past user ratings (Recommender Systems,
2014). Neighborhood-based methods are also commonly
referred to as memory-based approaches (Breese,
Heckerman, & Kadie, July 1998). Amazon and YouTube
recommender systems exploit the collaborative and heuris-
tic-based approaches (Park, Park, Jung, & Lee, May 2015).
Using the similarity, sim(a,n), the ratings are predicted
for the items not viewed by the user. According to
(Herlocker, Konstan, Borchers, & Riedl, 1999), the rating
for an active user a for an item i is defined as follows:
where N(a) denotes the set of K-Nearest Neighbors of a

among the users that have rated item i; rn,i denotes the rat-
ing of item i by user n; 𝑟̅a and 𝑟̅n are the average ratings of
user a and neighbor n, respectively; sim(a,n) is the simi-
larity between a and n.
Figure 2: AI Approach for a Recommender System using Collab- CF algorithms produce personalized movie recommen-
orative Filtering dations as compared to the methods that recommend “top-
N movies”, “most popular movies” or “highly rated mov-
The assumption used in the Collaborative filtering mod- ies” (Park, Park, Jung, & Lee, May 2015). Collaborative
el is that if certain users have the same preferences regard- Filtering is widely popular because of its simplicity and
ing certain items then they will most likely have the same ability to predict accurate results.
opinion about other items as well. The concept behind this
3
Implementation
The MovieLens dataset available at the Grouplens website
(Grouplens, 2018) was used for this recommender system.
The dataset consists of 100,000 ratings applied to 9,000
movies by 600 users. The movies dataset consists of movie
ids, titles and genres. The ratings dataset consists of ratings
given by each user and each user has a distinct user-id. Figure 3: UserId – MovieId 2D Matrix containing ratings pro-
vided by the user
Table 1: Ratings Dataset with userId and movieId
The similarity was calculated from the matrix using
userId movieId rating timestamp Pearson’s coefficient. The system takes the movie and cal-
1 5060 5 964984002 culates its similarity with other movies based on user rat-
2 318 3 1445714835 ings. The system was configured to take into considera-
2 333 4 1445715029 tions only movies that have more than 275 ratings. It was
3 31 0.5 1306463578 observed that less number of ratings did not provide accu-
4 106 4 986848784 rate results primarily due to the lack of data. This number
was determined by a trial and error method and may differ
The ratings are on a scale from 0.5 to 5. For the purposes based on the data and requirements. It was observed that a
of this implementation, the user preference scale has been number significantly lower than 275 provided more movie
related to the ratings as below: title recommendations but there was a trade-off in terms of
rating count. To ensure more accuracy, in the current im-
plementation, a higher rating count is preferred. Below is
Table 2: User preference – rating scale
an example of the movies similar to “The Shawshank Re-
User Preference Rating scale demption, (1994)”:
Strongly Like 5
Like 4
Average 3
Dislike 2
Strongly Dislike 1
The implementation is done using Python open-source

libraries pandas, numpy, and sklearn. The top-rated movies
can simply be calculated by the movies that are rated the Figure 4: Pearson coefficient for movies corresponding to “The
most and their average ratings. Shawshank Redemption, (1994)”
To recommend movies using the k-NN method, the line-
ar co-relation between movies is calculated using the Pear-
The corresponding movies titles that are most like the
son correlation. To calculate the coefficient, we need the
movie “The Shawshank Redemption, (1994)” as computed
average rating and total number of rating as calculated pre-
above are:
viously. To ensure statistical significance for the experi-
ment, the following are removed from the dataset: movies
with less than 50 ratings and users who provided less than
100 ratings. This was done because movies with a low
count of ratings and a high rating value altered the results
and provided false positives. To calculate the correlation,
the ratings and movie-ids are first pivoted into a 2D matrix.
Since every user did not rate every movie, a significant
number of ratings have a null value. Following is an exam-
ple of the converted matrix: Figure 5: Movie titles similar to “The Shawshank Redemption,
(1994)”
4
It should be noted that movie at index 0 is the same as
the one (The Shawshank Redemption, (1994)) for which
the value of similarity index is being calculated. The Pear-
son Coefficient for the movie is 1 because it is the same
movie. Thus, the most similar movies have a higher value
of the Pearson Coefficient. Other movies that are similar to
“The Shawshank Redemption, (1994)” can be selected
Figure 7: Movie Recommendations generated with k-NN model
based on the total rating count as well as the coefficient
showing distance values
value. Movies, “Forrest Gump (1994)” and “Pulp Fiction
(1994)” have the highest rating count amongst all the dis-
played movie titles and thus these are most likely to be
similar to “The Shawshank Redemption, (1994)”. These
are the movie titles based on co-relation. Evaluation
Using the k-NN model, movie recommendations are A good recommender system is as good as its core recom-
made by calculating the distance between items and deter- mender algorithm and its performance and accuracy needs
mining the closest neighbors. The neighbors with the least to be measured before the recommendations can be made
distance from the item are most likely to be the user’s pref- to the user. All evaluation was completed offline i.e. there
erence. The most popular neighbors are picked based on was no user-interaction required but only the dataset to
the total rating count. The number of neighbors k can be calculate the evaluation metrics. This practice is referred to
specified based on the available data and system require- as Offline Analytics. The advantage of using Offline Ana-
ments. The value of k was determined by experimenting lytics is that there is no interaction required with the user
with different values starting from 3, which is a reasonable and thus is easy to implement with low computational
number of neighbors to start with. In this implementation, costs (Chena & Liu, Dec 2017). The testing method used
the value of k was set as 5 because this value provided the was k-fold cross-validation and was implemented using
most accurate results. The below values were observed for Python Scikit open-source libraries.
the evaluation metrics of Root Mean Square Error To determine quality of predictions and the accuracy of
(RMSE), Mean Absolute Error, and Fraction of Concord- the results, the most common error metrics, Mean Absolute
ant Pairs (FCP): Error (MAE) and Root Mean Square Error(RMSE) were
used. RMSE measures the standard deviation of the errors
the system makes in its predictions (Géron, 2017). MAE
measures the average magnitude of the errors in a set of
predictions, but it does not consider the direction of the
error (positive error or negative error) (Chena & Liu, Dec
2017). MAE is the average of the absolute difference be-
tween actual and predicted values. Both metrics predict
average model prediction error but RMSE is more useful
when there is a need to remove large errors from the data.
RMSE gives high weight to large errors because errors are
squares before they are averaged. RMSE and MAE are
calculated as below:
Root Mean Square Error:

Figure 6: Comparison of RMSE, MAE, and FCP for values k=3
and k=5 in this implementation of the recommender system
Mean Absolute Error:

Below is another example, the movie “Aladdin (1992)”,
based on which recommendations are were made and the
following are the recommendations generated by the mod-
el:
5
where Q is the test set, 𝑟𝑢𝑖 represents the user’s true ratings, receives more data, its capability to make predictions and
ȓ𝑢𝑖 represents the prediction ratings of the recommender the quality of results improves.
system (Chena & Liu, Dec 2017). Another metric that was used to compute the quality of
The dataset was split into training dataset and testing precisions is the Fraction of Concordant Pairs (FCP), a
dataset. The method used for calculating similarities was measure that generalizes the known AUC metric into non-
Pearson method and the value of k used was 5 for the be- binary ordered outcomes (Koren & Sill, 2013). FCP
low experiment. The value of RMSE and MAE was calcu- measures the proportion of pairs of well classified items.
lated with multiple values of percentage of test dataset
used. Below are the results for the error values observed
when k-NN was implemented and tested:
where 𝑛𝑐 = ∑𝑢 𝑛𝑐𝑢 and 𝑛𝑑 = ∑𝑢 𝑛𝑑𝑢 are the summations of

concordant pairs and discordant pairs respectively over all
users (Koren & Sill, 2013). Concordant pairs are the ones
ranked correctly by the predictor ȓu. For each user u, the
concordant pairs are calculated as:
The discordant pairs 𝑛𝑑𝑢 are the ones that are incorrectly
ranked and are calculated in a similar fashion as the con-
cordant pairs.
FCP was calculated for the model after the dataset was
split into training dataset and testing dataset, just like
RMSE and MAE were calculated above. The observed
Figure 8: RMSE vs. percentage of data used for evaluation
results are as shown below:
Figure 10: FCP vs. percentage of data used for evaluation

Figure 9: MAE vs. percentage of data used for evaluation According to Koren & Still, the higher the FCP the
better the results. In this experiment, it was observed that
FCP significantly rose when the size of training dataset
It can be observed from the above evaluations that the
increased. This also leads to the conclusion that the k-NN
error values decrease with an increase in training data and
algorithm provides better quality results when it receives
the prediction accuracy increases. When the dataset was
more data.
split into 10% training and 90% test sets, highest values of
As previously mentioned, there are several methods to
error metrics was observed. The values of RMSE and
calculate similarities between items. Two of those were
MAE gradually decrease when the size of training dataset
implemented as a part of this experiment to evaluate which
was increased. This can be explained as the phenomenon
one provides better quality results. The two similarity
“Cold Start Problem”. The k-NN algorithm heavily relies
measuring techniques used were the Cosine similarity and
on past user ratings to make predictions and as the model
6
the Pearson similarity. Below are the results obtained for Lessons Learned
RMSE and MAE over 5 iterations:
One of the most significant lessons learnt in this project
Table 3: Comparison of evaluation metrics for the Cosine simi- was the selection of the value of k to maintain a balance
larity and the Pearson similarity between over-fitting and under-fitting the data. A higher
RMSE (Cosine) RMSE (Pearson) MAE (Cosine) MAE (Pearson) value of k provided more accurate results but is computa-
Iteration 1 1.1678 1.0388 0.9241 0.8047 tionally more expensive. This did not impact the perfor-
Iteration 2 1.1747 1.035 0.9287 0.8042
mance significantly for the experiment but may not be
Iteration 3 1.1758 1.0362 0.9279 0.7983
Iteration 4 1.1707 1.0332 0.9207 0.7989 suitable to distributed systems. After selecting a lower val-
Iteration 5 1.178 1.0461 0.9306 0.8088 ue of k it was observed that the noise impacted the results
and predictions were not accurate.
It is evident from the results that the Pearson similarity To calculate the similarities, different methods were
provides results with lower error rates and thus was the used. It was observed that the cosine-coefficient did not
preferred method to compute similarity during the final result in accurate predictions because of the null (NaN)
implementation to obtain movie predictions. values in the data. This data signifies the ratings that have
Though the Pearson similarity does provide better re- not been provided by the user. The Pearson (centered co-
sults, a trade-off in computation costs was observed. The sine) similarity considers the weighted average of the rat-
time taken by the system to fit the model and to test the ings by their similarities and thus is able to reduce the
model is significantly greater while using Pearson similari- noise caused by the null values whereas, the cosine-
ty than when using the cosine similarity. Below are the coefficient considers the values as 0. Treating null values
mean and standard deviation of the time taken by the two as 0 caused the model to provide inaccurate results.
methods over all the conducted iterations: Another challenge that was faced during the implemen-
tation of this system was to determine the statistical signif-
Table 4: Comparison of time taken to fit and test the model for icance of results. The statistical distribution of the data was
the Cosine similarity and the Pearson similarity used to determine the top movies and the movies that had
Fit time Fit time Test time Test time the largest rating count. It was observed that 1% of the
(Cosine) (Pearson) (Cosine) (Pearson) movies received more than 66 ratings. This information
Mean 4.62 4.84 4.04 4.06 was used to filter data with less significance for experi-
Std 0.05 0.07 0.05 0.06 mentation purposes. The movies with less than 50 ratings
and users who provided less than 100 ratings were exclud-
It should be noted that while the difference in time taken ed from the data to determine significant co-relations.
does not impact this experiment but in a practical imple- Another challenge was to determine the appropriate
mentation, with distributed systems, any time difference recommendations from the list of movie titles based on
can cause an exponential delay in delivering recommenda- correlation as well as k-NN. The model predicts the rec-
tions to a customer. The Pearson similarity is better, but the ommendations, but it is the designer’s responsibility to
designer of a recommender system should take into con- display the top recommendations rather than a complete
sideration the requirements of the customers and the com- list. If the user is presented with many recommendations,
pute time of the system to strike a balance between the then we circle back to the same problem we set to remedy
time taken and the quality of recommendations in a real- – to assist users in choosing items based on past prefer-
time scenario. It is often recommended to compute similar- ence. From a design perspective it was observed that if the
ity between users or items offline to save the computation number of ratings for a movie remained significantly
efforts. greater then the number of predictions made by the model
Some of the challenges encountered in evaluating per- were more accurate. In contrast, if the ratting count was
formance of the recommender system were: low, the predictions made were found to be insignificant
• Determining the appropriate combination of the and rather just a “list of movies”.
similarity index and value of k to make predic- During the prototyping phase of implementation, when
tions because both factors greatly impact error the complete dataset was not used, the system ran into
rates. “Cold Start” problem. Since the model heavily relies on
• Computing the value of k to reduce the error in explicit user ratings, there is not enough data during the
prediction. Greater values of k led to over-fitting start of the system to make predictions and it was observed
of the model and the predictions were insignifi- that the recommendations were in-accurate. Several solu-
cant. tions such as the Singular Value Decomposition, the Met-
ric Factorization and rating comparison strategy (Zhou &
7
Nadaf, Apr 2017) have been proposed to solve the “Cold Nearest Algorithm with other machine learning algorithms
Start” problem. Another method is to use a Random Walk to provide better quality predictions and this is also one of
with Restart along with the Recommendation Algorithm to the key area of research in the industry.
generate similarity of items (Sakarkar & Deshpande, May
2016). Zhou & Nadaf suggested Word Embedding, pro-
posed by Google in 2013, that uses a parameterized word References
function to convert a sparse representation into a dense Adomavicius, G., & Tuzhilin, A. (June 2005). Toward the next
representation. Zhou & Nadaf used the MovieLens dataset generation of recommender systems: a survey of the state-of-the-
to conduct experiments to test their proposed method, Em- art and possible extensions. IEEE Transactions on Knowledge
and Data Engineering, 734–749. Retrieved from
bedded Collaborative Filtering. Through their experiment
https://ieeexplore-ieee-
they observed that ECF outperforms the other methods in org.ezproxy.library.tamu.edu/document/1423975
“Cold Start” cases (Zhou & Nadaf, Apr 2017). The method Breese, J., Heckerman, D., & Kadie, C. (July 1998). Empirical
provided great potential to solve the “Cold-Start” problem analysis of predictive algorithms for collaborative filtering. Pro-
in applications with sparse data however does not prove ceedings of the fourteenth conference on uncertainty in artificial
effective for systems with dense data. intelligence, Madison, Wisconsin.
Chena, M., & Liu, P. (Dec 2017). Performance Evaluation of
Recommender Systems. International Journal of Performability
Conclusion Engineering, 1246-1256.
Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn
In this paper, a system was implemented to make movie and TensorFlow. Sebastopol, CA: O’Reilly Media, Inc.
recommendations to a user based on past ratings. The ma- Grouplens. (2018). Retrieved from https://grouplens.org/
chine learning algorithm K-Nearest Neighbors was used
Haruna, K., Akmar Ismail, M., Damiasih, D., Sutopo, J., &
for the implementation. To implement the algorithm the Herawan, T. (Oct 2017). A collaborative approach for research
model computed a similarity between items. The model paper recommender system. PLoS ONE. Retrieved from
then selects the k most similar items based on the user’s https://doi.org/10.1371/journal.pone.0184516
past ratings. All predictions were computed using the Herlocker, J. L., Konstan, J., Borchers, A., & Riedl, J. (1999). An
open-source Python machine learning libraries. The results algorithmic framework for performing collaborative filtering. In
were compared and evaluated to determine the k value. Proceedings of the 22Nd Annual International ACM SIGIR Con-
ference on Research and Development in Information Retrieval
The evaluation was performed using Root Mean Square (pp. 230-237). Berkeley, California, USA: ACM. Retrieved from
Error, Mean Absolute Error and Fraction of Concordant http://doi.acm.org/10.1145/312624.312682
Pairs. The lesser the RMSE and MAE and the greater the Koren, Y., & Sill, J. (2013). Collaborative Filtering on Ordinal
FCP, the better the results. It was observed that calculating User Feedback. Proceedings of the Twenty-Third International
the similarities takes significant computation time and Joint Conference on Artificial Intelligence, pp. 3022-3026. Re-
memory and thus it is recommended to perform this task trieved from http://dl.acm.org/citation.cfm?id=2540128.2540570
offline. If the similarities are computed offline, the user Maheshwari, A. (2015). Data Analytics Made Accessible. Ama-
can be provided with results much faster. It was also ob- zon Digital Services LLC.
served that the k-NN algorithm provided results with less Melville, P., & Sindhwani, V. (April 2010). Recommender Sys-
tems. Encyclopedia of Machine Learning, 1.
error rates when the training data increased. Thus, to pro-
vide predictions to new user it is recommended to use k- Park, Y., Park, S., Jung, W., & Lee, S.-g. (May 2015). Reversed
CF: A fast collaborative filtering algorithm using a k-nearest
NN in conjunction with other algorithms. To provide rec- neighbor graph. ScienceDirect - Expert Systems with Applica-
ommendations to a new user, it is also possible to use the tions, Volume 42, Issue 8, 4022-4028.
user’s preference data from social networking platforms, Phorasim, P., & Yu, L. (2017). Movies recommendation system
but this should be done with the user’s consent to maintain using collaborative filtering and k-means. International Journal
ethical standards in data sharing. of Advanced Computer Research, Vol 7(29).
It was also observed that despite the availability of many Portugal, I., Alencar, P., & Cowan, D. (May 2018). The Use of
open-source libraries and packages, it is the designer’s Machine Learning Algorithms in Recommender Systems: A Sys-
responsibility to configure the system to correctly provide tematic Review. Expert Systems with Applications, Volume 97,
205-227.
predictions. The designer should be familiar with the da-
Recommender Systems. (2014). In G. Kembellec. John Wiley &
tasets. The observations were not just limited to evaluating
Sons, Incorporated. Retrieved from ProQuest Ebook Central,
the current algorithm, but some key findings were also https://ebookcentral-proquest-
presented as a scope for future work. It is possible to com.ezproxy.library.tamu.edu/lib/tamucs/detail.action?docID=18
achieve better results by combining user behavior with 84200.
explicit user ratings. This can be done by combining K-
8
Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., & Reidl, J.
(1994a). GroupLens: An open architecture for collaborative filter-
ing. ACM Press New York, NY, USA.
Sakarkar, G., & Deshpande, S. (May 2016). Clustering Based
Approach to Overcome Cold Start Problem in Intelligent e-
Learning system. International Journal of Latesr Trends in Engi-
neering and Technology (IJLTET).
Zhou, Y., & Nadaf, A. (Apr 2017). Embedded Collaborative Fil-
tering for “Cold Start” Prediction. Q.I. Leap Analytics Inc.

Implementation and Evaluation of A Movie Recommender System Using Collaborative Filtering Algorithm

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Implementation and Evaluation of A Movie Recommender System Using Collaborative Filtering Algorithm

Загружено:

Авторское право:

Implementation and evaluation of a movie recommender system using

Collaborative Filtering algorithm

dance of data collection techniques and ever-increasing

where N(a) denotes the set of K-Nearest Neighbors of a

The implementation is done using Python open-source

Root Mean Square Error:

Mean Absolute Error:

where 𝑛𝑐 = ∑𝑢 𝑛𝑐𝑢 and 𝑛𝑑 = ∑𝑢 𝑛𝑑𝑢 are the summations of

Figure 10: FCP vs. percentage of data used for evaluation

Вам также может понравиться